Image Operations¶
This page documents the primary image processing operations available in polars-cv.
Resize¶
Resize images to specified dimensions.
Pipeline().source("image_bytes").resize(height=224, width=224)
Pipeline().source("image_bytes").resize(height=224, width=224, filter="bilinear")
Filters: "nearest", "bilinear", "lanczos3" (default).
Grayscale¶
Convert to grayscale using luminance formula.
Blur¶
Apply Gaussian blur.
Threshold¶
Convert to binary image.
Crop¶
Extract a rectangular region.
Rotate¶
Rotate by an angle in degrees. Part of the affine transform family.
Pipeline().source("image_bytes").rotate(angle=90)
Pipeline().source("image_bytes").rotate(angle=45, expand=True)
Pipeline().source("image_bytes").rotate(angle=30, interpolation="nearest", border_value=128)
Fast path: 90, 180, and 270 degree rotations use zero-copy view operations (metadata-only, no allocation). The interpolation and border_value parameters are ignored for these angles.
Arbitrary angles are executed via the same affine transform code path as warp_affine. When a rotate is followed by a warp_affine (or vice versa), they are fused automatically.
Pad¶
Add padding to edges.
Pipeline().source("image_bytes").pad(top=10, bottom=10, value=128)
Pipeline().source("image_bytes").pad_to_size(height=224, width=224)
Pipeline().source("image_bytes").letterbox(height=224, width=224)
Flip¶
Histogram¶
Compute the pixel value histogram. The histogram can return counts, normalized frequencies, bin edges, quantized images, or detailed "buckets" combining counts and edges. The default output is "buckets".
# Detailed buckets output (returns List[Struct] with lower_edge, upper_edge, count, normalized)
Pipeline().source("image_bytes").grayscale().histogram(bins=256)
# Return raw bin counts
Pipeline().source("image_bytes").grayscale().histogram(bins=64, output="counts")
# Return normalized frequencies
Pipeline().source("image_bytes").grayscale().histogram(bins=64, output="normalized")
# Explicit bin edges (custom intervals)
Pipeline().source("image_bytes").grayscale().histogram(bins=[0, 50, 100, 200, 255])
# Left or right closed intervals
Pipeline().source("image_bytes").grayscale().histogram(bins=10, closed="right")
Outputs: "buckets" (default), "counts", "normalized", "quantized", "edges".
Closed Intervals: "left" (default), "right".
Color Conversion¶
Convert between color spaces using cvt_color or convenience methods.
# Generic conversion
Pipeline().source("image_bytes").cvt_color("rgb", "hsv")
# Convenience methods
Pipeline().source("image_bytes").to_hsv()
Pipeline().source("image_bytes").to_lab() # promotes to f32
Pipeline().source("image_bytes").to_bgr()
Pipeline().source("image_bytes").to_ycbcr()
Supported spaces: rgb, bgr, hsv, lab, ycbcr, gray.
Channel Operations¶
Channel Select¶
Extract a single channel from a multi-channel image, producing a 2D [H, W] buffer.
Channel Swap¶
Reorder channels in a multi-channel image.
Intensity Adjustments¶
Contrast¶
Scale pixel deviation from the mean: (pixel - mean) * factor + mean.
Gamma Correction¶
Power-law correction: normalizes to [0,1], applies pixel^gamma, then denormalizes.
Pipeline().source("image_bytes").adjust_gamma(gamma=0.5) # brighter
Pipeline().source("image_bytes").adjust_gamma(gamma=2.0) # darker
Brightness¶
Scale pixel values with clamping.
Invert¶
Invert pixel values (255 - pixel for u8, 1.0 - pixel for float).
All intensity parameters accept Polars expressions for per-row dynamic values.
Convolution¶
Apply 2D convolution with an arbitrary kernel.
# Custom 3x3 emboss kernel
kernel = [-2, -1, 0, -1, 1, 1, 0, 1, 2]
Pipeline().source("image_bytes").convolve2d(kernel, ksize=3)
# Normalize kernel so output values stay in range
Pipeline().source("image_bytes").convolve2d(kernel, ksize=3, normalize=True)
Border modes: "replicate" (default), "zero", "reflect".
Sobel¶
Sobel gradient operator (delegates to convolve2d with standard kernels).
Pipeline().source("image_bytes").grayscale().sobel(axis="x")
Pipeline().source("image_bytes").grayscale().sobel(axis="y", ksize=3)
Laplacian¶
Second-derivative operator for edge detection.
Sharpen¶
Unsharp-mask-style sharpening. strength=0 produces the identity.
Edge Detection¶
Canny¶
Multi-stage edge detection (Gaussian blur, Sobel gradients, non-maximum suppression, hysteresis thresholding). Output is a U8 binary edge map.
Thresholds accept Polars expressions for per-row values.
Histogram Equalization¶
Contrast enhancement via cumulative histogram remapping. Operates per-channel on multi-channel images. Output is U8.
Morphological Operations¶
Morphological operations for binary mask and segmentation post-processing. All require single-channel input (use .grayscale() or .threshold() first).
Erode¶
Shrink bright regions by computing the local minimum over a rectangular neighborhood.
Pipeline().source("image_bytes").grayscale().threshold(128).erode(ksize=3)
Pipeline().source("image_bytes").grayscale().threshold(128).erode(ksize=5, iterations=2)
Dilate¶
Grow bright regions by computing the local maximum over a rectangular neighborhood.
Pipeline().source("image_bytes").grayscale().threshold(128).dilate(ksize=3)
Pipeline().source("image_bytes").grayscale().threshold(128).dilate(ksize=5, iterations=2)
Opening¶
Erode then dilate — removes small bright noise spots while preserving larger structures.
This is a Python-side composite equivalent to .erode(ksize=ksize).dilate(ksize=ksize).
Closing¶
Dilate then erode — fills small dark holes while preserving larger structures.
This is a Python-side composite equivalent to .dilate(ksize=ksize).erode(ksize=ksize).
Morphological Gradient¶
Dilate minus erode — produces an edge outline.
Common workflow: threshold() → erode() → dilate() → extract_contours().
Affine Transforms¶
Apply arbitrary 2x3 affine transformations. All methods in this family share the same Rust execution code path and can be fused together. The matrix uses the forward-mapping convention (same as OpenCV warpAffine): the kernel inverts it internally for interpolation.
The affine family includes:
rotate()-- simple angle-based rotation (see above)warp_affine()-- raw 2x3 matrixshear()-- shear convenience methodrotate_and_scale()-- rotation + uniform scaling around a center
Warp Affine¶
# Translate by (tx=30, ty=20), output same size as input
Pipeline().source("image_bytes").warp_affine(
matrix=[1, 0, 30, 0, 1, 20],
output_size=(224, 224),
)
# Nearest-neighbor interpolation, white border fill
Pipeline().source("image_bytes").warp_affine(
matrix=[1, 0, 30, 0, 1, 20],
output_size=(224, 224),
interpolation="nearest",
border_value=255.0,
)
Parameters:
| Parameter | Description | Default |
|---|---|---|
matrix |
Six-element [a, b, tx, c, d, ty] forward-mapping matrix |
— |
output_size |
(height, width) of the output |
— |
interpolation |
"bilinear" or "nearest" |
"bilinear" |
border_value |
Fill value for out-of-bounds pixels | 0.0 |
Shear¶
Convenience wrapper that builds a shear matrix and delegates to warp_affine.
Pipeline().source("image_bytes").shear(sx=0.3, output_size=(224, 224))
Pipeline().source("image_bytes").shear(sy=0.2, output_size=(224, 224))
Pipeline().source("image_bytes").shear(sx=0.1, sy=0.15, output_size=(224, 224))
Rotate and Scale¶
Combined rotation and uniform scaling around a center point.
Pipeline().source("image_bytes").rotate_and_scale(
angle=45, # degrees, positive = clockwise
scale=0.8,
center=(112, 112), # (cx, cy) rotation center
output_size=(224, 224),
)
Pipeline Fusion¶
Consecutive affine operations (warp_affine, rotate with static arbitrary angle, shear, rotate_and_scale) are automatically fused into a single matrix multiplication at planning time, eliminating redundant interpolation passes:
pipe = (
Pipeline()
.source("image_bytes")
.warp_affine(matrix=[1, 0, 50, 0, 1, 0], output_size=(224, 224)) # translate X
.warp_affine(matrix=[1, 0, 0, 0, 1, 30], output_size=(224, 224)) # translate Y
)
# Serializes as a single warp_affine with matrix [1, 0, 50, 0, 1, 30]
pipe = (
Pipeline()
.source("image_bytes")
.assert_shape(height=224, width=224)
.rotate(45)
.warp_affine(matrix=[1, 0, 10, 0, 1, 10], output_size=(224, 224)) # translate after rotate
)
# Fused into a single warp_affine
Fusion limitations: rotate with an expression-based angle, or with a zero-copy angle (90/180/270), does not participate in fusion. Non-affine ops between two affine ops break the fusion run.
Layout¶
Transpose¶
Transpose dimensions.
Reshape¶
Reshape array to new dimensions.
Resize Variants¶
In addition to resize(height=..., width=...), polars-cv provides aspect-ratio-preserving resize methods.
# Resize by scale factor
Pipeline().source("image_bytes").resize_scale(scale=0.5)
Pipeline().source("image_bytes").resize_scale(scale_x=2.0, scale_y=0.5)
# Resize to target height (width computed from aspect ratio)
Pipeline().source("image_bytes").resize_to_height(512)
# Resize to target width (height computed from aspect ratio)
Pipeline().source("image_bytes").resize_to_width(640)
# Resize so the longest side equals target
Pipeline().source("image_bytes").resize_max(max_size=256)
# Resize so the shortest side equals target
Pipeline().source("image_bytes").resize_min(min_size=128)
All resize variants accept Polars expressions for per-row dynamic sizes.
Shape Assertion¶
Provide shape hints for the pipeline planner. Useful for asserting known dimensions when the source has unknown shape.
# Assert the decoded image has 4 channels (RGBA)
Pipeline().source("image_bytes").assert_shape(channels=4)
# Assert full shape
Pipeline().source("image_bytes").assert_shape(height=512, width=512, channels=3)
Dynamic Parameters¶
Most numeric parameters across polars-cv operations accept Polars expressions in addition to literal values. When a parameter is an expression, its value is resolved per-row at execution time from the DataFrame.
# Static value (same for all rows)
pipe = Pipeline().source("image_bytes").resize(height=224, width=224)
# Dynamic value (per-row from another column)
pipe = Pipeline().source("image_bytes").resize(
height=pl.col("target_h"), width=pl.col("target_w")
)
# Expression with aggregation (same value for all rows)
pipe = Pipeline().source("image_bytes").crop(
top=0, left=0,
height=pl.col("crop_h").min(),
width=pl.col("crop_w").min(),
)
Parameters that accept expressions:
| Category | Parameters |
|---|---|
| Resize | height, width, scale, scale_x, scale_y, max_size, min_size |
| Crop | top, left, height, width |
| Pad | top, bottom, left, right, value |
| Pad to size / Letterbox | height, width, value |
| Rotate | angle |
| Warp affine | output_size (height and width) |
| Scale / Clamp | factor, min_val, max_val |
| Threshold | value |
| Blur | sigma |
| Canny | low_threshold, high_threshold |
| Contrast / Gamma / Brightness | factor, gamma |
| Morphology | ksize, iterations |
| Channel select | index |
| Convolution | ksize |
| Reductions | q (percentile), ddof (std) |
| Histogram | bins (integer form) |
| Rasterize | fill_value, background |
Planning-time implications: When a shape-affecting parameter is an expression (e.g., resize(height=pl.col("h"))), the pipeline planner cannot determine the output dimensions at planning time. Shape hints will be None for those dimensions.
Structural parameters like matrix (affine), kernel (convolution), axes (transpose/flip), and enum values like interpolation, mode, border remain static only.