Multi-Output Pipelines¶
Extract multiple intermediate results from a single optimized execution.
Overview¶
polars-cv's multi-output feature enables:
- Extracting multiple pipeline stages
- Automatic CSE (Common Subexpression Elimination)
- Single execution pass for all outputs
- Struct column with named fields
Basic Multi-Output¶
Use .alias() to name checkpoints and dict .sink() for multi-output:
import polars as pl
from polars_cv import Pipeline
# Build pipeline with named checkpoints
base = (
pl.col("image")
.cv.pipe(Pipeline().source("image_bytes").resize(height=128, width=128))
.alias("resized")
)
gray = base.pipe(Pipeline().grayscale()).alias("gray")
thresh = gray.pipe(Pipeline().threshold(128)).alias("thresh")
# Merge branches for multi-output
merged = thresh.merge_pipe(gray)
# Sink multiple outputs
result = df.with_columns(
outputs=merged.sink({
"resized": "png",
"gray": "png",
"thresh": "png",
})
)
# outputs is a Struct column with fields: resized, gray, thresh
Extracting Outputs¶
Access individual outputs from the Struct column:
# Extract individual fields
resized = result.select(pl.col("outputs").struct.field("resized"))
gray = result.select(pl.col("outputs").struct.field("gray"))
# Or extract all at once
extracted = result.select(
pl.col("outputs").struct.field("resized").alias("resized_img"),
pl.col("outputs").struct.field("gray").alias("gray_img"),
pl.col("outputs").struct.field("thresh").alias("thresh_img"),
)
Common Subexpression Elimination (CSE)¶
polars-cv automatically detects and shares common operations:
```mermaid flowchart LR subgraph Before["Before CSE"] A1[resize] --> B1[grayscale] --> C1[blur] A2[resize] --> B2[grayscale] --> D1[threshold] end
subgraph After["After CSE"]
A[resize] --> B[grayscale]
B --> C[blur]
B --> D[threshold]
end
```
# Both branches share: resize → grayscale
base = pl.col("image").cv.pipe(
Pipeline().source("image_bytes").resize(height=100, width=100)
)
gray = base.pipe(Pipeline().grayscale()).alias("gray")
# Branch 1: blur
blur = gray.pipe(Pipeline().blur(2.0)).alias("blur")
# Branch 2: threshold
thresh = gray.pipe(Pipeline().threshold(128)).alias("thresh")
# CSE automatically shares the gray computation
merged = blur.merge_pipe(thresh)
result = df.with_columns(
outputs=merged.sink({"gray": "png", "blur": "png", "thresh": "png"})
)
Merge Patterns¶
Merging Multiple Branches¶
# Create several branches
branch1 = base.pipe(ops1).alias("out1")
branch2 = base.pipe(ops2).alias("out2")
branch3 = base.pipe(ops3).alias("out3")
# Merge all into one graph
merged = branch1.merge_pipe(branch2).merge_pipe(branch3)
Including Earlier Stages¶
base = expr.cv.pipe(pipe).alias("base")
processed = base.pipe(more_ops).alias("processed")
# Include base in output
merged = processed.merge_pipe(base)
result = df.with_columns(
outputs=merged.sink({"base": "png", "processed": "png"})
)
Mixed Output Formats¶
Different outputs can have different formats:
result = df.with_columns(
outputs=merged.sink({
"display": "png", # For display
"analysis": "numpy", # For NumPy analysis
"raw_data": "list", # As Polars List
})
)
Multi-Source Multi-Output¶
Combine multiple input columns with multiple outputs:
# Two different input columns
img = pl.col("image").cv.pipe(Pipeline().source("image_bytes").resize(height=128, width=128))
mask = pl.col("mask_contour").cv.pipe(Pipeline().source("contour", shape=img))
# Apply mask
masked = img.apply_mask(mask).alias("masked")
# Merge for multi-output
merged = masked.merge_pipe(img.alias("original")).merge_pipe(mask.alias("mask"))
result = df.with_columns(
outputs=merged.sink({
"original": "png",
"mask": "png",
"masked": "png",
})
)
Performance Benefits¶
| Approach | Execution Passes | Memory |
|---|---|---|
Separate .cv.pipe() calls |
N (one per output) | N buffers |
Multi-output with .sink({...}) |
1 | Shared buffers |
Best Practices¶
- Use
.alias(): Name every checkpoint you want to output - Merge All Branches: Call
.merge_pipe()to include all in graph - CSE is Automatic: No need to manually deduplicate
- Match Output Names: Sink dict keys must match alias names
- Extract Early: Pull out needed fields to avoid carrying the whole Struct
Next Steps¶
- Binary Operations - Combine pipeline outputs