Pipelines¶
Pipelines are the core abstraction in polars-cv. They define a sequence of operations that can be applied to image data.
Pipeline Structure¶
A polars-cv pipeline has three parts:
mermaid
flowchart LR
Source["Source"] --> Operations["Operations"] --> Sink["Sink"]
- Source: How to interpret input data (e.g.,
image_bytes,file_path) - Operations: Transformations to apply (e.g.,
resize,grayscale) - Sink: Final output format (e.g.,
numpy,png)
Modular Style¶
The recommended way to use polars-cv is to define a pipeline of transformations and then apply it to a column, followed by a sink.
from polars_cv import Pipeline
import polars as pl
# 1. Define operations (reusable)
preprocess = (
Pipeline()
.source("image_bytes")
.resize(height=224, width=224)
.grayscale()
)
# 2. Apply to column and choose sink
df = pl.DataFrame({"image": [png_bytes]})
result = df.with_columns(
processed=pl.col("image").cv.pipe(preprocess).sink("numpy")
)
Source Formats¶
| Format | Input Type | Description |
|---|---|---|
image_bytes |
Binary | Decode PNG/JPEG/TIFF bytes (auto format+dtype detect) |
file_path |
String | Local/cloud/HTTP path; decodes like image_bytes |
raw |
Binary | Raw bytes (requires dtype) |
list |
List | Polars nested List |
array |
Array | Polars fixed-size Array |
contour |
Struct | Contour geometry to rasterize |
Auto DType for Image Sources¶
For image_bytes and file_path, decoded dtype is determined at runtime:
- PNG/JPEG typically decode to
u8 - 16-bit PNG decodes to
u16 - TIFF can decode to
u8,u16,f32, orf64
Because of this variability, the pipeline tracks dtype as auto until it can be resolved by:
source(..., dtype="...")- a dtype-fixing operation such as
normalize,threshold, orcast
If you use sink("list") or sink("array"), dtype must be known at planning time.
Sink Formats¶
| Format | Output Type | Description |
|---|---|---|
numpy |
Binary | NumPy-compatible bytes |
png |
Binary | PNG bytes |
jpeg |
Binary | JPEG bytes |
tiff |
Binary | TIFF bytes with LZW compression (supports floating-point) |
list |
List | Polars nested List |
array |
Array | Polars fixed-size Array |
native |
Varies | Native Python type (for scalars/vectors) |
Chaining Operations¶
Operations are chained fluently. Most image operations accept both literal values and Polars expressions.
pipe = (
Pipeline()
.source("image_bytes")
.resize(height=256, width=256)
.crop(top=pl.col("y_off"), left=pl.col("x_off"), height=100, width=100)
.normalize(method="minmax")
)
Best Practices¶
- Reuse Pipelines: Define pipelines once and apply them to many columns.
- Dynamic Parameters: Use Polars expressions for per-row customization.
- Modular Sinks: Keep pipelines generic and choose the sink format at the last step.
Next Steps¶
- Sources - Source formats and auto dtype behavior
- Domains - Multi-domain pipelines
- Multi-Output - Extracting multiple results