Metrics API Reference¶

Core Types¶

DetectionTable `dataclass` ¶

Canonical intermediate representation consumed by all metric functions.

Wraps two aligned lazy frames:

detections — one row per detection with image_id, class_id, score, is_tp, gt_idx, iou, det_idx.
image_metadata — one row per (image, class) with n_gts, weight, gt_label, and optionally group_id.

Use :meth:from_matched to construct with schema validation.

from_matched `classmethod` ¶

from_matched(
    detections: LazyFrame | DataFrame,
    image_meta: LazyFrame | DataFrame,
    *,
    matching_iou_threshold: float | None = None,
) -> DetectionTable

Construct a DetectionTable with planning-time schema validation.

Parameters:

Name	Type	Description	Default
`detections`	`LazyFrame \| DataFrame`	Per-detection rows. Must contain columns `image_id`, `class_id`, `score`, `is_tp`, `gt_idx`, `iou`, `det_idx`.	required
`image_meta`	`LazyFrame \| DataFrame`	Per-image metadata. Must contain `image_id`, `class_id`, `n_gts`, `weight`, `gt_label`.	required
`matching_iou_threshold`	`float \| None`	The IoU threshold used by the matcher. Stored so that :meth:`at_iou_threshold` can warn when the caller tries to lower the threshold below the matching level (which has no effect).	`None`

Returns:

Type	Description
`DetectionTable`	Validated `DetectionTable` instance.

Raises:

Type	Description
`ValueError`	If required columns are missing.

with_group ¶

with_group(group_col: str) -> DetectionTable

Return a copy with group_id set from an existing metadata column.

Parameters:

Name	Type	Description	Default
`group_col`	`str`	Column in `image_metadata` to use as the group.	required

Returns:

Type	Description
`DetectionTable`	New `DetectionTable` with `group_id` populated.

filter_class ¶

filter_class(class_id: str) -> DetectionTable

Return a copy filtered to a single class.

Parameters:

Name	Type	Description	Default
`class_id`	`str`	The class to retain.	required

Returns:

Type	Description
`DetectionTable`	Filtered `DetectionTable`.

class_ids ¶

class_ids() -> list[str]

Return distinct class IDs present in the detections.

Note: triggers a partial collect on the metadata frame.

at_iou_threshold ¶

at_iou_threshold(iou_threshold: float) -> DetectionTable

Return a copy with is_tp recomputed at a different IoU threshold.

The stored iou column is compared against iou_threshold to set is_tp without re-running the matching step.

.. warning::

Re-thresholding only works reliably when *raising* the threshold
above the original matching IoU. Lowering it has no effect because
detections that were unmatched at the original threshold have no
stored ``gt_idx``/``iou`` to re-evaluate.

Parameters:

Name	Type	Description	Default
`iou_threshold`	`float`	New IoU threshold to apply.	required

Returns:

Type	Description
`DetectionTable`	`DetectionTable` with updated `is_tp`.

to_per_image ¶

to_per_image() -> pl.LazyFrame

Aggregate detections to one row per image with top-scoring detection.

Produces one row per (image_id, class_id) with:

detections: list of detection structs sorted by score descending
compatibility columns max_score and top_is_tp from the highest-scoring detection
metadata columns gt_label, weight, n_gts

LROC consumes detections for image-level summarization (best localized detection), rather than relying only on the top-scoring detection.

image_ids_and_strata ¶

image_ids_and_strata() -> tuple[
    list[str], dict[str, str] | None
]

Extract image IDs and optional stratification mapping.

Returns:

Type	Description
`tuple[list[str], dict[str, str] \| None]`	Tuple of `(image_ids, strata_dict \| None)`.

collect ¶

collect(
    engine: str = "streaming",
) -> tuple[pl.DataFrame, pl.DataFrame]

Materialize both frames.

Returns:

Type	Description
`tuple[DataFrame, DataFrame]`	Tuple of `(detections_df, image_meta_df)`.

MetricResult `dataclass` ¶

Base class for all detection metric results.

Subclasses add metric-specific convenience methods with pre-bound column names (e.g., FROCResult.sensitivity_at_fp).

Attributes:

Name	Type	Description
`curve`	`DataFrame`	DataFrame containing the computed metric curve.
`metadata`	`dict[str, Any]`	Arbitrary metadata about the computation.

auc ¶

auc(
    *,
    x_col: str,
    y_col: str,
    x_range: tuple[float, float] | None = None,
    correction: CorrectionMethod = None,
) -> float

Compute (partial) AUC under the curve.

Parameters:

Name	Type	Description	Default
`x_col`	`str`	Column name for the x-axis values.	required
`y_col`	`str`	Column name for the y-axis values.	required
`x_range`	`tuple[float, float] \| None`	Optional `(lo, hi)` bounds for partial AUC.	`None`
`correction`	`CorrectionMethod`	Optional correction for partial AUC. `None` returns the raw area. `"normalize"` divides by the x-range width. `"mcclish"` applies McClish's standardized correction (only valid with `x_range`).	`None`

Returns:

Type	Description
`float`	Area under the curve (or partial area).

interpolate ¶

interpolate(*, x_col: str, y_col: str, at: float) -> float

Linearly interpolate a y-value at a given x-value.

Parameters:

Name	Type	Description	Default
`x_col`	`str`	Column name for the x-axis.	required
`y_col`	`str`	Column name for the y-axis.	required
`at`	`float`	The x-value at which to interpolate.	required

Returns:

Type	Description
`float`	Interpolated y-value.

summary_table ¶

summary_table(
    *, x_col: str, y_col: str, operating_points: list[float]
) -> pl.DataFrame

Build a summary at specific operating points.

Parameters:

Name	Type	Description	Default
`x_col`	`str`	Column for x-axis values.	required
`y_col`	`str`	Column for y-axis values.	required
`operating_points`	`list[float]`	x-values at which to report interpolated y.	required

Returns:

Type	Description
`DataFrame`	DataFrame with `x_col` and `y_col` columns.

BootstrapResult `dataclass` ¶

Container for bootstrap confidence interval results.

Attributes:

Name	Type	Description
`point_estimate`	`float`	Metric value on the original sample.
`ci_lower`	`float`	Lower confidence bound.
`ci_upper`	`float`	Upper confidence bound.
`confidence`	`float`	Confidence level used for the interval.
`distribution`	`list[float]`	Raw bootstrap metric values.

Matchers¶

ContourMatcher ¶

Match detections from heatmaps + binary masks via contour extraction.

This is the refactored version of the original prepare_detection_table from _prepare.py. It extracts contours from both predictions and GT masks, scores predictions against the heatmap, and runs greedy IoU matching via .contour.match_detections().

Parameters:

Name	Type	Description	Default
`iou_threshold`	`float`	IoU threshold for TP matching.	`0.5`
`extraction_threshold`	`float`	Threshold for contour extraction from heatmaps.	`0.1`
`min_contour_area`	`float`	Minimum extracted contour area for predictions.	`1.0`
`auto_resize`	`bool`	Whether to resize heatmaps to mask shapes automatically.	`True`
`gt_min_contour_area`	`float \| None`	Separate min area for GT contours (defaults to `min_contour_area`).	`None`

match ¶

match(
    data: LazyFrame | DataFrame,
    *,
    pred_col: str,
    gt_col: str,
    score_col: str | None = None,
    class_col: str | None = None,
    image_id_col: str | None = None,
    weight_col: str | None = None,
    group_col: str | None = None,
) -> DetectionTable

Produce a DetectionTable from heatmap + binary mask data.

Accepts any column format supported by polars-cv sources: nested List[List[...]], VIEW protocol Binary (blob), or fixed-size Array[...]. The source format is auto-detected from the column dtype.

Parameters:

Name	Type	Description	Default
`data`	`LazyFrame \| DataFrame`	Input frame with one image/sample per row.	required
`pred_col`	`str`	Prediction heatmap column (any supported format).	required
`gt_col`	`str`	Ground-truth binary mask column (any supported format).	required
`score_col`	`str \| None`	Unused for contour matching (scores are derived from heatmap peaks).	`None`
`class_col`	`str \| None`	Optional class label column for multi-class metrics.	`None`
`image_id_col`	`str \| None`	Optional image identifier column (defaults to row index).	`None`
`weight_col`	`str \| None`	Optional sample weight column.	`None`
`group_col`	`str \| None`	Optional grouping column.	`None`

Returns:

Type	Description
`DetectionTable`	Validated `DetectionTable`.

BBoxMatcher ¶

Match detections from bounding-box lists via IoU matching.

Expects prediction and ground-truth columns as List[Struct{x, y, width, height}] (i.e. List[BBOX_SCHEMA]). Scores should be provided as a separate List[Float64] column aligned with the prediction bboxes.

Matching calls the Rust bbox_match_detections plugin function which internally converts each bbox to a 4-point rectangular contour and delegates to the existing contour matching infrastructure.

Parameters:

Name	Type	Description	Default
`iou_threshold`	`float`	IoU threshold for TP matching.	`0.5`

match ¶

match(
    data: LazyFrame | DataFrame,
    *,
    pred_col: str,
    gt_col: str,
    score_col: str | None = None,
    class_col: str | None = None,
    image_id_col: str | None = None,
    weight_col: str | None = None,
    group_col: str | None = None,
) -> DetectionTable

Produce a DetectionTable from bbox prediction/GT lists.

Parameters:

Name	Type	Description	Default
`data`	`LazyFrame \| DataFrame`	Input frame with one image/sample per row.	required
`pred_col`	`str`	Prediction bboxes column (`List[BBOX_SCHEMA]`).	required
`gt_col`	`str`	Ground-truth bboxes column (`List[BBOX_SCHEMA]`).	required
`score_col`	`str \| None`	Per-prediction score column (`List[Float64]`). Required for bbox matching.	`None`
`class_col`	`str \| None`	Optional class label column.	`None`
`image_id_col`	`str \| None`	Optional image identifier column.	`None`
`weight_col`	`str \| None`	Optional sample weight column.	`None`
`group_col`	`str \| None`	Optional grouping column.	`None`

Returns:

Type	Description
`DetectionTable`	Validated `DetectionTable`.

Raises:

Type	Description
`ValueError`	If `score_col` is not provided.

PreMatchedAdapter ¶

Adapter for data that already has per-detection TP/FP assignments.

Expects a flat table where each row is one detection with at minimum:

score (float) — confidence score
is_tp (bool) — whether this detection is a true positive

Plus per-image metadata either inline or via n_gts column.

match ¶

match(
    data: LazyFrame | DataFrame,
    *,
    pred_col: str = "score",
    gt_col: str = "is_tp",
    score_col: str | None = None,
    class_col: str | None = None,
    image_id_col: str | None = None,
    weight_col: str | None = None,
    group_col: str | None = None,
    n_gts_col: str | None = None,
    gt_label_col: str | None = None,
    iou_col: str | None = None,
    det_idx_col: str | None = None,
) -> DetectionTable

Wrap pre-matched data into a DetectionTable.

Parameters:

Name	Type	Description	Default
`data`	`LazyFrame \| DataFrame`	Input frame with one row per detection.	required
`pred_col`	`str`	Column with confidence scores (aliased to `score`).	`'score'`
`gt_col`	`str`	Column with TP flag (aliased to `is_tp`).	`'is_tp'`
`score_col`	`str \| None`	Alias for `pred_col` (takes precedence if both set).	`None`
`class_col`	`str \| None`	Optional class label column.	`None`
`image_id_col`	`str \| None`	Image identifier column (required, or row index used).	`None`
`weight_col`	`str \| None`	Optional sample weight column.	`None`
`group_col`	`str \| None`	Optional grouping column.	`None`
`n_gts_col`	`str \| None`	Column with per-image GT count.	`None`
`gt_label_col`	`str \| None`	Column with per-image positive/negative label.	`None`
`iou_col`	`str \| None`	Optional column with per-detection IoU values.	`None`
`det_idx_col`	`str \| None`	Optional column with detection index within image.	`None`

Returns:

Type	Description
`DetectionTable`	Validated `DetectionTable`.

Metric Functions¶

precision_recall_curve ¶

precision_recall_curve(
    table: DetectionTable, *, class_id: str | None = None
) -> PrecisionRecallResult

Compute a precision-recall curve from a DetectionTable.

Detections are sorted by confidence score (descending). At each rank, cumulative TP/FP are computed and precision/recall derived. All computation uses Polars lazy expressions.

Parameters:

Name	Type	Description	Default
`table`	`DetectionTable`	Canonical detection table.	required
`class_id`	`str \| None`	Restrict to a specific class. `None` uses all detections.	`None`

Returns:

Type	Description
`PrecisionRecallResult`	`PrecisionRecallResult` with the PR curve.

average_precision ¶

average_precision(
    table: DetectionTable,
    *,
    class_id: str | None = None,
    interpolation: Literal[
        "all_points", "11_point"
    ] = "all_points",
) -> float

Compute Average Precision for a single class.

Parameters:

Name	Type	Description	Default
`table`	`DetectionTable`	Canonical detection table.	required
`class_id`	`str \| None`	Restrict to a specific class.	`None`
`interpolation`	`Literal['all_points', '11_point']`	`"all_points"` (trapezoidal) or `"11_point"` (VOC).	`'all_points'`

Returns:

Type	Description
`float`	AP value in [0, 1].

mean_average_precision ¶

mean_average_precision(
    table: DetectionTable,
    *,
    iou_thresholds: list[float] | None = None,
    interpolation: Literal[
        "all_points", "11_point"
    ] = "all_points",
) -> float

Compute Mean Average Precision across classes and IoU thresholds.

If iou_thresholds is provided, the stored iou column is re-thresholded at each level to recompute is_tp -- no re-matching is needed.

Parameters:

Name	Type	Description	Default
`table`	`DetectionTable`	Canonical detection table.	required
`iou_thresholds`	`list[float] \| None`	IoU thresholds to average over. Defaults to `[0.5]` (Pascal VOC). Use `[0.5, 0.55, ..., 0.95]` for COCO.	`None`
`interpolation`	`Literal['all_points', '11_point']`	AP interpolation method.	`'all_points'`

Returns:

Type	Description
`float`	mAP value in [0, 1].

precision_at_threshold ¶

precision_at_threshold(
    table: DetectionTable,
    threshold: float,
    *,
    class_id: str | None = None,
) -> float

Compute precision at a given score threshold.

Parameters:

Name	Type	Description	Default
`table`	`DetectionTable`	Canonical detection table.	required
`threshold`	`float`	Score threshold.	required
`class_id`	`str \| None`	Optional class filter.	`None`

Returns:

Type	Description
`float`	Precision value.

recall_at_threshold ¶

recall_at_threshold(
    table: DetectionTable,
    threshold: float,
    *,
    class_id: str | None = None,
) -> float

Compute recall at a given score threshold.

Parameters:

Name	Type	Description	Default
`table`	`DetectionTable`	Canonical detection table.	required
`threshold`	`float`	Score threshold.	required
`class_id`	`str \| None`	Optional class filter.	`None`

Returns:

Type	Description
`float`	Recall value.

f1_at_threshold ¶

f1_at_threshold(
    table: DetectionTable,
    threshold: float,
    *,
    class_id: str | None = None,
) -> float

Compute F1 score at a given score threshold.

Parameters:

Name	Type	Description	Default
`table`	`DetectionTable`	Canonical detection table.	required
`threshold`	`float`	Score threshold.	required
`class_id`	`str \| None`	Optional class filter.	`None`

Returns:

Type	Description
`float`	F1 value in [0, 1].

froc_curve ¶

froc_curve(
    table: DetectionTable,
    *,
    thresholds: list[float] | None = None,
) -> FROCResult

Compute FROC operating points from a DetectionTable.

Uses a cumulative-sum approach: detections are sorted by score (descending) and TP/FP counts are accumulated, producing one curve point per unique score. This avoids the O(images x thresholds) dense grid that the previous implementation used.

Parameters:

Name	Type	Description	Default
`table`	`DetectionTable`	Canonical detection table produced by a matcher.	required
`thresholds`	`list[float] \| None`	Optional explicit score thresholds. When provided, the curve is filtered to these thresholds only.	`None`

Returns:

Type	Description
`FROCResult`	`FROCResult` with curve and metadata.

lroc_curve ¶

lroc_curve(
    table: DetectionTable,
    *,
    variant: Literal["best_tp", "top_scoring"] = "best_tp",
) -> LROCResult

Compute LROC operating points from a DetectionTable.

LROC requires: - gt_label in image metadata (positive/negative per image). - Per-image top-detection reduction from DetectionTable.to_per_image().

Two scoring variants are supported:

"best_tp" (default): For each positive image, the effective score is the highest-scoring TP detection. An image counts as localized if it has any TP above the threshold.
"top_scoring": For each positive image, take the single highest-scoring detection (regardless of TP/FP). The image counts as localized only if that top detection is a TP. This matches the classical single-commitment LROC (Swensson 1996).

For negative images both variants use the maximum detection score.

Parameters:

Name	Type	Description	Default
`table`	`DetectionTable`	Canonical detection table produced by a matcher.	required
`variant`	`Literal['best_tp', 'top_scoring']`	`"best_tp"` or `"top_scoring"`.	`'best_tp'`

Returns:

Type	Description
`LROCResult`	`LROCResult` with curve, per-image summary, and metadata.

confusion_at_threshold ¶

confusion_at_threshold(
    table: DetectionTable,
    threshold: float,
    *,
    class_id: str | None = None,
) -> dict[str, int]

Compute TP, FP, FN counts at a given score threshold.

Parameters:

Name	Type	Description	Default
`table`	`DetectionTable`	Canonical detection table.	required
`threshold`	`float`	Score threshold — detections with `score >= threshold` are considered active.	required
`class_id`	`str \| None`	Optional class filter.	`None`

Returns:

Type	Description
`dict[str, int]`	Dictionary with keys `tp`, `fp`, `fn`.

Bootstrap¶

bootstrap_metric_sequential ¶

bootstrap_metric_sequential(
    *,
    image_ids: list[str],
    metric_fn: Callable[[list[str]], float],
    point_estimate: float,
    n_bootstrap: int = 1000,
    confidence: float = 0.95,
    seed: int | None = None,
    strata: dict[str, str] | None = None,
) -> BootstrapResult

Estimate confidence intervals by image-level bootstrap sampling.

This is the sequential fallback that calls metric_fn once per iteration. Use :func:bootstrap_pr_auc for a fully vectorized Polars-native path when computing PR-based AUC.

Parameters:

Name	Type	Description	Default
`image_ids`	`list[str]`	Base image IDs to sample with replacement.	required
`metric_fn`	`Callable[[list[str]], float]`	Callback computing a scalar metric from sampled image IDs.	required
`point_estimate`	`float`	Metric on the original sample.	required
`n_bootstrap`	`int`	Number of bootstrap iterations.	`1000`
`confidence`	`float`	Confidence level in `(0, 1)`.	`0.95`
`seed`	`int \| None`	Optional RNG seed.	`None`
`strata`	`dict[str, str] \| None`	Optional image->stratum mapping for stratified resampling.	`None`

Returns:

Type	Description
`BootstrapResult`	`BootstrapResult` with percentile confidence interval.

bootstrap_pr_auc ¶

bootstrap_pr_auc(
    table: DetectionTable,
    *,
    n_bootstrap: int = 1000,
    confidence: float = 0.95,
    seed: int | None = None,
    class_id: str | None = None,
) -> BootstrapResult

Vectorized bootstrap for precision-recall AUC.

Generates all bootstrap samples as a single DataFrame, joins with the detection table, and computes AP per bootstrap iteration using Polars window functions -- all in one lazy plan.

Parameters:

Name	Type	Description	Default
`table`	`DetectionTable`	Canonical detection table.	required
`n_bootstrap`	`int`	Number of bootstrap iterations.	`1000`
`confidence`	`float`	Confidence level in `(0, 1)`.	`0.95`
`seed`	`int \| None`	Optional RNG seed.	`None`
`class_id`	`str \| None`	Optional class filter.	`None`

Returns:

Type	Description
`BootstrapResult`	`BootstrapResult` with percentile confidence interval.

Metrics API Reference¶

Core Types¶

DetectionTable dataclass ¶

from_matched classmethod ¶

with_group ¶

filter_class ¶

class_ids ¶

at_iou_threshold ¶

to_per_image ¶

image_ids_and_strata ¶

collect ¶

MetricResult dataclass ¶

auc ¶

interpolate ¶

summary_table ¶

BootstrapResult dataclass ¶

Matchers¶

ContourMatcher ¶

match ¶

BBoxMatcher ¶

match ¶

PreMatchedAdapter ¶

match ¶

Metric Functions¶

precision_recall_curve ¶

average_precision ¶

mean_average_precision ¶

precision_at_threshold ¶

recall_at_threshold ¶

f1_at_threshold ¶

froc_curve ¶

lroc_curve ¶

confusion_at_threshold ¶

Bootstrap¶

bootstrap_metric_sequential ¶

bootstrap_pr_auc ¶

DetectionTable `dataclass` ¶

from_matched `classmethod` ¶

MetricResult `dataclass` ¶

BootstrapResult `dataclass` ¶