Pipelines¶

Pipelines are the core abstraction in polars-cv. They define a sequence of operations that can be applied to image data.

Pipeline Structure¶

A polars-cv pipeline has three parts:

mermaid flowchart LR Source["Source"] --> Operations["Operations"] --> Sink["Sink"]

Source: How to interpret input data (e.g., image_bytes, file_path)
Operations: Transformations to apply (e.g., resize, grayscale)
Sink: Final output format (e.g., numpy, png)

Modular Style¶

The recommended way to use polars-cv is to define a pipeline of transformations and then apply it to a column, followed by a sink.

from polars_cv import Pipeline
import polars as pl

# 1. Define operations (reusable)
preprocess = (
    Pipeline()
    .source("image_bytes")
    .resize(height=224, width=224)
    .grayscale()
)

# 2. Apply to column and choose sink
df = pl.DataFrame({"image": [png_bytes]})
result = df.with_columns(
    processed=pl.col("image").cv.pipe(preprocess).sink("numpy")
)

Source Formats¶

Format	Input Type	Description
`image_bytes`	Binary	Decode PNG/JPEG/TIFF bytes (auto format+dtype detect)
`file_path`	String	Local/cloud/HTTP path; decodes like `image_bytes`
`raw`	Binary	Raw bytes (requires `dtype`)
`list`	List	Polars nested List
`array`	Array	Polars fixed-size Array
`contour`	Struct	Contour geometry to rasterize

Auto DType for Image Sources¶

For image_bytes and file_path, decoded dtype is determined at runtime:

PNG/JPEG typically decode to u8
16-bit PNG decodes to u16
TIFF can decode to u8, u16, f32, or f64

Because of this variability, the pipeline tracks dtype as auto until it can be resolved by:

source(..., dtype="...")
a dtype-fixing operation such as normalize, threshold, or cast

If you use sink("list") or sink("array"), dtype must be known at planning time.

Sink Formats¶

Format	Output Type	Description
`numpy`	Binary	NumPy-compatible bytes
`png`	Binary	PNG bytes
`jpeg`	Binary	JPEG bytes
`tiff`	Binary	TIFF bytes with LZW compression (supports floating-point)
`list`	List	Polars nested List
`array`	Array	Polars fixed-size Array
`native`	Varies	Native Python type (for scalars/vectors)

Chaining Operations¶

Operations are chained fluently. Most image operations accept both literal values and Polars expressions.

pipe = (
    Pipeline()
    .source("image_bytes")
    .resize(height=256, width=256)
    .crop(top=pl.col("y_off"), left=pl.col("x_off"), height=100, width=100)
    .normalize(method="minmax")
)

Best Practices¶

Reuse Pipelines: Define pipelines once and apply them to many columns.
Dynamic Parameters: Use Polars expressions for per-row customization.
Modular Sinks: Keep pipelines generic and choose the sink format at the last step.

Next Steps¶

Sources - Source formats and auto dtype behavior
Domains - Multi-domain pipelines
Multi-Output - Extracting multiple results