Sources¶
Sources define how the input column should be interpreted before operations are applied.
Image Sources: image_bytes and file_path¶
Both image_bytes and file_path decode images with automatic format and dtype detection:
- PNG/JPEG usually decode to
u8 - 16-bit PNG decodes to
u16 - TIFF may decode to
u8,u16,f32, orf64
Decoded images are treated as 3D buffers ([H, W, C]).
from polars_cv import Pipeline
bytes_pipe = Pipeline().source("image_bytes")
path_pipe = Pipeline().source("file_path")
Error Handling with on_error¶
By default, decode failures raise an error and abort the query. Set on_error="null" to emit a null value for rows that fail to decode, allowing the rest of the DataFrame to process successfully.
# Skip corrupt images instead of aborting
pipe = Pipeline().source("image_bytes", on_error="null").resize(height=224, width=224)
Auto DType Behavior¶
For image_bytes and file_path, dtype is runtime-dependent, so the pipeline starts with dtype auto.
You can resolve dtype by:
- providing
dtypeinsource(...) - using an operation with deterministic output dtype (for example
normalize,threshold, orcast)
# Assert and enforce dtype at source
pipe = Pipeline().source("image_bytes", dtype="f32").resize(height=224, width=224)
Planning-Time Requirement for list/array Sinks¶
sink("list") and sink("array") require known element dtype at planning time.
For image sources, resolve dtype before these sinks:
# Option 1: source dtype
pipe = Pipeline().source("file_path", dtype="u8").resize(height=224, width=224)
# Option 2: cast in pipeline
pipe = Pipeline().source("image_bytes").resize(height=224, width=224).cast("f32")
file_path and Cloud Access¶
file_path supports local paths and remote URIs (for example s3://, gs://, az://, http://).
Use CloudOptions when credentials or provider settings are needed:
from polars_cv import CloudOptions, Pipeline
options = CloudOptions(
aws_region="us-east-1",
aws_access_key_id="...",
aws_secret_access_key="...",
)
pipe = Pipeline().source("file_path", cloud_options=options)
Contour Source and Shape Inference¶
The contour source rasterizes contour structs to binary mask buffers. You can specify dimensions explicitly or infer them from another pipeline expression.
from polars_cv import Pipeline
# Explicit dimensions
mask_pipe = Pipeline().source("contour", width=200, height=200)
# Infer dimensions from an image pipeline
img = pl.col("image").cv.pipe(Pipeline().source("image_bytes").resize(height=200, width=200))
mask_pipe = Pipeline().source("contour", shape=img)
When using shape=, the contour mask dimensions automatically match the referenced pipeline's output size.
Other Sources¶
raw: raw bytes, requires explicitdtypeblob: self-describing binary VIEW protocollist/array: infer from Polars column type (or override withdtype)contour: rasterizes contour structs to mask buffers (see above)