Skip to content

Metrics API Reference

Core Types

DetectionTable dataclass

Canonical intermediate representation consumed by all metric functions.

Wraps two aligned lazy frames:

  • detections — one row per detection with image_id, class_id, score, is_tp, gt_idx, iou, det_idx.
  • image_metadata — one row per (image, class) with n_gts, weight, gt_label, and optionally group_id.

Use :meth:from_matched to construct with schema validation.

from_matched classmethod

from_matched(
    detections: LazyFrame | DataFrame,
    image_meta: LazyFrame | DataFrame,
    *,
    matching_iou_threshold: float | None = None,
) -> DetectionTable

Construct a DetectionTable with planning-time schema validation.

Parameters:

Name Type Description Default
detections LazyFrame | DataFrame

Per-detection rows. Must contain columns image_id, class_id, score, is_tp, gt_idx, iou, det_idx.

required
image_meta LazyFrame | DataFrame

Per-image metadata. Must contain image_id, class_id, n_gts, weight, gt_label.

required
matching_iou_threshold float | None

The IoU threshold used by the matcher. Stored so that :meth:at_iou_threshold can warn when the caller tries to lower the threshold below the matching level (which has no effect).

None

Returns:

Type Description
DetectionTable

Validated DetectionTable instance.

Raises:

Type Description
ValueError

If required columns are missing.

with_group

with_group(group_col: str) -> DetectionTable

Return a copy with group_id set from an existing metadata column.

Parameters:

Name Type Description Default
group_col str

Column in image_metadata to use as the group.

required

Returns:

Type Description
DetectionTable

New DetectionTable with group_id populated.

filter_class

filter_class(class_id: str) -> DetectionTable

Return a copy filtered to a single class.

Parameters:

Name Type Description Default
class_id str

The class to retain.

required

Returns:

Type Description
DetectionTable

Filtered DetectionTable.

class_ids

class_ids() -> list[str]

Return distinct class IDs present in the detections.

Note: triggers a partial collect on the metadata frame.

at_iou_threshold

at_iou_threshold(iou_threshold: float) -> DetectionTable

Return a copy with is_tp recomputed at a different IoU threshold.

The stored iou column is compared against iou_threshold to set is_tp without re-running the matching step.

.. warning::

Re-thresholding only works reliably when *raising* the threshold
above the original matching IoU. Lowering it has no effect because
detections that were unmatched at the original threshold have no
stored ``gt_idx``/``iou`` to re-evaluate.

Parameters:

Name Type Description Default
iou_threshold float

New IoU threshold to apply.

required

Returns:

Type Description
DetectionTable

DetectionTable with updated is_tp.

to_per_image

to_per_image() -> pl.LazyFrame

Aggregate detections to one row per image with top-scoring detection.

Produces one row per (image_id, class_id) with:

  • detections: list of detection structs sorted by score descending
  • compatibility columns max_score and top_is_tp from the highest-scoring detection
  • metadata columns gt_label, weight, n_gts

LROC consumes detections for image-level summarization (best localized detection), rather than relying only on the top-scoring detection.

image_ids_and_strata

image_ids_and_strata() -> tuple[
    list[str], dict[str, str] | None
]

Extract image IDs and optional stratification mapping.

Returns:

Type Description
tuple[list[str], dict[str, str] | None]

Tuple of (image_ids, strata_dict | None).

collect

collect(
    engine: str = "streaming",
) -> tuple[pl.DataFrame, pl.DataFrame]

Materialize both frames.

Returns:

Type Description
tuple[DataFrame, DataFrame]

Tuple of (detections_df, image_meta_df).

MetricResult dataclass

Base class for all detection metric results.

Subclasses add metric-specific convenience methods with pre-bound column names (e.g., FROCResult.sensitivity_at_fp).

Attributes:

Name Type Description
curve DataFrame

DataFrame containing the computed metric curve.

metadata dict[str, Any]

Arbitrary metadata about the computation.

auc

auc(
    *,
    x_col: str,
    y_col: str,
    x_range: tuple[float, float] | None = None,
    correction: CorrectionMethod = None,
) -> float

Compute (partial) AUC under the curve.

Parameters:

Name Type Description Default
x_col str

Column name for the x-axis values.

required
y_col str

Column name for the y-axis values.

required
x_range tuple[float, float] | None

Optional (lo, hi) bounds for partial AUC.

None
correction CorrectionMethod

Optional correction for partial AUC. None returns the raw area. "normalize" divides by the x-range width. "mcclish" applies McClish's standardized correction (only valid with x_range).

None

Returns:

Type Description
float

Area under the curve (or partial area).

interpolate

interpolate(*, x_col: str, y_col: str, at: float) -> float

Linearly interpolate a y-value at a given x-value.

Parameters:

Name Type Description Default
x_col str

Column name for the x-axis.

required
y_col str

Column name for the y-axis.

required
at float

The x-value at which to interpolate.

required

Returns:

Type Description
float

Interpolated y-value.

summary_table

summary_table(
    *, x_col: str, y_col: str, operating_points: list[float]
) -> pl.DataFrame

Build a summary at specific operating points.

Parameters:

Name Type Description Default
x_col str

Column for x-axis values.

required
y_col str

Column for y-axis values.

required
operating_points list[float]

x-values at which to report interpolated y.

required

Returns:

Type Description
DataFrame

DataFrame with x_col and y_col columns.

BootstrapResult dataclass

Container for bootstrap confidence interval results.

Attributes:

Name Type Description
point_estimate float

Metric value on the original sample.

ci_lower float

Lower confidence bound.

ci_upper float

Upper confidence bound.

confidence float

Confidence level used for the interval.

distribution list[float]

Raw bootstrap metric values.

Matchers

ContourMatcher

Match detections from heatmaps + binary masks via contour extraction.

This is the refactored version of the original prepare_detection_table from _prepare.py. It extracts contours from both predictions and GT masks, scores predictions against the heatmap, and runs greedy IoU matching via .contour.match_detections().

Parameters:

Name Type Description Default
iou_threshold float

IoU threshold for TP matching.

0.5
extraction_threshold float

Threshold for contour extraction from heatmaps.

0.1
min_contour_area float

Minimum extracted contour area for predictions.

1.0
auto_resize bool

Whether to resize heatmaps to mask shapes automatically.

True
gt_min_contour_area float | None

Separate min area for GT contours (defaults to min_contour_area).

None

match

match(
    data: LazyFrame | DataFrame,
    *,
    pred_col: str,
    gt_col: str,
    score_col: str | None = None,
    class_col: str | None = None,
    image_id_col: str | None = None,
    weight_col: str | None = None,
    group_col: str | None = None,
) -> DetectionTable

Produce a DetectionTable from heatmap + binary mask data.

Accepts any column format supported by polars-cv sources: nested List[List[...]], VIEW protocol Binary (blob), or fixed-size Array[...]. The source format is auto-detected from the column dtype.

Parameters:

Name Type Description Default
data LazyFrame | DataFrame

Input frame with one image/sample per row.

required
pred_col str

Prediction heatmap column (any supported format).

required
gt_col str

Ground-truth binary mask column (any supported format).

required
score_col str | None

Unused for contour matching (scores are derived from heatmap peaks).

None
class_col str | None

Optional class label column for multi-class metrics.

None
image_id_col str | None

Optional image identifier column (defaults to row index).

None
weight_col str | None

Optional sample weight column.

None
group_col str | None

Optional grouping column.

None

Returns:

Type Description
DetectionTable

Validated DetectionTable.

BBoxMatcher

Match detections from bounding-box lists via IoU matching.

Expects prediction and ground-truth columns as List[Struct{x, y, width, height}] (i.e. List[BBOX_SCHEMA]). Scores should be provided as a separate List[Float64] column aligned with the prediction bboxes.

Matching calls the Rust bbox_match_detections plugin function which internally converts each bbox to a 4-point rectangular contour and delegates to the existing contour matching infrastructure.

Parameters:

Name Type Description Default
iou_threshold float

IoU threshold for TP matching.

0.5

match

match(
    data: LazyFrame | DataFrame,
    *,
    pred_col: str,
    gt_col: str,
    score_col: str | None = None,
    class_col: str | None = None,
    image_id_col: str | None = None,
    weight_col: str | None = None,
    group_col: str | None = None,
) -> DetectionTable

Produce a DetectionTable from bbox prediction/GT lists.

Parameters:

Name Type Description Default
data LazyFrame | DataFrame

Input frame with one image/sample per row.

required
pred_col str

Prediction bboxes column (List[BBOX_SCHEMA]).

required
gt_col str

Ground-truth bboxes column (List[BBOX_SCHEMA]).

required
score_col str | None

Per-prediction score column (List[Float64]). Required for bbox matching.

None
class_col str | None

Optional class label column.

None
image_id_col str | None

Optional image identifier column.

None
weight_col str | None

Optional sample weight column.

None
group_col str | None

Optional grouping column.

None

Returns:

Type Description
DetectionTable

Validated DetectionTable.

Raises:

Type Description
ValueError

If score_col is not provided.

PreMatchedAdapter

Adapter for data that already has per-detection TP/FP assignments.

Expects a flat table where each row is one detection with at minimum:

  • score (float) — confidence score
  • is_tp (bool) — whether this detection is a true positive

Plus per-image metadata either inline or via n_gts column.

match

match(
    data: LazyFrame | DataFrame,
    *,
    pred_col: str = "score",
    gt_col: str = "is_tp",
    score_col: str | None = None,
    class_col: str | None = None,
    image_id_col: str | None = None,
    weight_col: str | None = None,
    group_col: str | None = None,
    n_gts_col: str | None = None,
    gt_label_col: str | None = None,
    iou_col: str | None = None,
    det_idx_col: str | None = None,
) -> DetectionTable

Wrap pre-matched data into a DetectionTable.

Parameters:

Name Type Description Default
data LazyFrame | DataFrame

Input frame with one row per detection.

required
pred_col str

Column with confidence scores (aliased to score).

'score'
gt_col str

Column with TP flag (aliased to is_tp).

'is_tp'
score_col str | None

Alias for pred_col (takes precedence if both set).

None
class_col str | None

Optional class label column.

None
image_id_col str | None

Image identifier column (required, or row index used).

None
weight_col str | None

Optional sample weight column.

None
group_col str | None

Optional grouping column.

None
n_gts_col str | None

Column with per-image GT count.

None
gt_label_col str | None

Column with per-image positive/negative label.

None
iou_col str | None

Optional column with per-detection IoU values.

None
det_idx_col str | None

Optional column with detection index within image.

None

Returns:

Type Description
DetectionTable

Validated DetectionTable.

Metric Functions

precision_recall_curve

precision_recall_curve(
    table: DetectionTable, *, class_id: str | None = None
) -> PrecisionRecallResult

Compute a precision-recall curve from a DetectionTable.

Detections are sorted by confidence score (descending). At each rank, cumulative TP/FP are computed and precision/recall derived. All computation uses Polars lazy expressions.

Parameters:

Name Type Description Default
table DetectionTable

Canonical detection table.

required
class_id str | None

Restrict to a specific class. None uses all detections.

None

Returns:

Type Description
PrecisionRecallResult

PrecisionRecallResult with the PR curve.

average_precision

average_precision(
    table: DetectionTable,
    *,
    class_id: str | None = None,
    interpolation: Literal[
        "all_points", "11_point"
    ] = "all_points",
) -> float

Compute Average Precision for a single class.

Parameters:

Name Type Description Default
table DetectionTable

Canonical detection table.

required
class_id str | None

Restrict to a specific class.

None
interpolation Literal['all_points', '11_point']

"all_points" (trapezoidal) or "11_point" (VOC).

'all_points'

Returns:

Type Description
float

AP value in [0, 1].

mean_average_precision

mean_average_precision(
    table: DetectionTable,
    *,
    iou_thresholds: list[float] | None = None,
    interpolation: Literal[
        "all_points", "11_point"
    ] = "all_points",
) -> float

Compute Mean Average Precision across classes and IoU thresholds.

If iou_thresholds is provided, the stored iou column is re-thresholded at each level to recompute is_tp -- no re-matching is needed.

Parameters:

Name Type Description Default
table DetectionTable

Canonical detection table.

required
iou_thresholds list[float] | None

IoU thresholds to average over. Defaults to [0.5] (Pascal VOC). Use [0.5, 0.55, ..., 0.95] for COCO.

None
interpolation Literal['all_points', '11_point']

AP interpolation method.

'all_points'

Returns:

Type Description
float

mAP value in [0, 1].

precision_at_threshold

precision_at_threshold(
    table: DetectionTable,
    threshold: float,
    *,
    class_id: str | None = None,
) -> float

Compute precision at a given score threshold.

Parameters:

Name Type Description Default
table DetectionTable

Canonical detection table.

required
threshold float

Score threshold.

required
class_id str | None

Optional class filter.

None

Returns:

Type Description
float

Precision value.

recall_at_threshold

recall_at_threshold(
    table: DetectionTable,
    threshold: float,
    *,
    class_id: str | None = None,
) -> float

Compute recall at a given score threshold.

Parameters:

Name Type Description Default
table DetectionTable

Canonical detection table.

required
threshold float

Score threshold.

required
class_id str | None

Optional class filter.

None

Returns:

Type Description
float

Recall value.

f1_at_threshold

f1_at_threshold(
    table: DetectionTable,
    threshold: float,
    *,
    class_id: str | None = None,
) -> float

Compute F1 score at a given score threshold.

Parameters:

Name Type Description Default
table DetectionTable

Canonical detection table.

required
threshold float

Score threshold.

required
class_id str | None

Optional class filter.

None

Returns:

Type Description
float

F1 value in [0, 1].

froc_curve

froc_curve(
    table: DetectionTable,
    *,
    thresholds: list[float] | None = None,
) -> FROCResult

Compute FROC operating points from a DetectionTable.

Uses a cumulative-sum approach: detections are sorted by score (descending) and TP/FP counts are accumulated, producing one curve point per unique score. This avoids the O(images x thresholds) dense grid that the previous implementation used.

Parameters:

Name Type Description Default
table DetectionTable

Canonical detection table produced by a matcher.

required
thresholds list[float] | None

Optional explicit score thresholds. When provided, the curve is filtered to these thresholds only.

None

Returns:

Type Description
FROCResult

FROCResult with curve and metadata.

lroc_curve

lroc_curve(
    table: DetectionTable,
    *,
    variant: Literal["best_tp", "top_scoring"] = "best_tp",
) -> LROCResult

Compute LROC operating points from a DetectionTable.

LROC requires: - gt_label in image metadata (positive/negative per image). - Per-image top-detection reduction from DetectionTable.to_per_image().

Two scoring variants are supported:

  • "best_tp" (default): For each positive image, the effective score is the highest-scoring TP detection. An image counts as localized if it has any TP above the threshold.
  • "top_scoring": For each positive image, take the single highest-scoring detection (regardless of TP/FP). The image counts as localized only if that top detection is a TP. This matches the classical single-commitment LROC (Swensson 1996).

For negative images both variants use the maximum detection score.

Parameters:

Name Type Description Default
table DetectionTable

Canonical detection table produced by a matcher.

required
variant Literal['best_tp', 'top_scoring']

"best_tp" or "top_scoring".

'best_tp'

Returns:

Type Description
LROCResult

LROCResult with curve, per-image summary, and metadata.

confusion_at_threshold

confusion_at_threshold(
    table: DetectionTable,
    threshold: float,
    *,
    class_id: str | None = None,
) -> dict[str, int]

Compute TP, FP, FN counts at a given score threshold.

Parameters:

Name Type Description Default
table DetectionTable

Canonical detection table.

required
threshold float

Score threshold — detections with score >= threshold are considered active.

required
class_id str | None

Optional class filter.

None

Returns:

Type Description
dict[str, int]

Dictionary with keys tp, fp, fn.

Bootstrap

bootstrap_metric_sequential

bootstrap_metric_sequential(
    *,
    image_ids: list[str],
    metric_fn: Callable[[list[str]], float],
    point_estimate: float,
    n_bootstrap: int = 1000,
    confidence: float = 0.95,
    seed: int | None = None,
    strata: dict[str, str] | None = None,
) -> BootstrapResult

Estimate confidence intervals by image-level bootstrap sampling.

This is the sequential fallback that calls metric_fn once per iteration. Use :func:bootstrap_pr_auc for a fully vectorized Polars-native path when computing PR-based AUC.

Parameters:

Name Type Description Default
image_ids list[str]

Base image IDs to sample with replacement.

required
metric_fn Callable[[list[str]], float]

Callback computing a scalar metric from sampled image IDs.

required
point_estimate float

Metric on the original sample.

required
n_bootstrap int

Number of bootstrap iterations.

1000
confidence float

Confidence level in (0, 1).

0.95
seed int | None

Optional RNG seed.

None
strata dict[str, str] | None

Optional image->stratum mapping for stratified resampling.

None

Returns:

Type Description
BootstrapResult

BootstrapResult with percentile confidence interval.

bootstrap_pr_auc

bootstrap_pr_auc(
    table: DetectionTable,
    *,
    n_bootstrap: int = 1000,
    confidence: float = 0.95,
    seed: int | None = None,
    class_id: str | None = None,
) -> BootstrapResult

Vectorized bootstrap for precision-recall AUC.

Generates all bootstrap samples as a single DataFrame, joins with the detection table, and computes AP per bootstrap iteration using Polars window functions -- all in one lazy plan.

Parameters:

Name Type Description Default
table DetectionTable

Canonical detection table.

required
n_bootstrap int

Number of bootstrap iterations.

1000
confidence float

Confidence level in (0, 1).

0.95
seed int | None

Optional RNG seed.

None
class_id str | None

Optional class filter.

None

Returns:

Type Description
BootstrapResult

BootstrapResult with percentile confidence interval.