Metrics API Reference¶
Core Types¶
DetectionTable
dataclass
¶
Canonical intermediate representation consumed by all metric functions.
Wraps two aligned lazy frames:
- detections — one row per detection with
image_id,class_id,score,is_tp,gt_idx,iou,det_idx. - image_metadata — one row per (image, class) with
n_gts,weight,gt_label, and optionallygroup_id.
Use :meth:from_matched to construct with schema validation.
from_matched
classmethod
¶
from_matched(
detections: LazyFrame | DataFrame,
image_meta: LazyFrame | DataFrame,
*,
matching_iou_threshold: float | None = None,
) -> DetectionTable
Construct a DetectionTable with planning-time schema validation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
detections
|
LazyFrame | DataFrame
|
Per-detection rows. Must contain columns |
required |
image_meta
|
LazyFrame | DataFrame
|
Per-image metadata. Must contain |
required |
matching_iou_threshold
|
float | None
|
The IoU threshold used by the matcher.
Stored so that :meth: |
None
|
Returns:
| Type | Description |
|---|---|
DetectionTable
|
Validated |
Raises:
| Type | Description |
|---|---|
ValueError
|
If required columns are missing. |
with_group ¶
Return a copy with group_id set from an existing metadata column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_col
|
str
|
Column in |
required |
Returns:
| Type | Description |
|---|---|
DetectionTable
|
New |
filter_class ¶
Return a copy filtered to a single class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
class_id
|
str
|
The class to retain. |
required |
Returns:
| Type | Description |
|---|---|
DetectionTable
|
Filtered |
class_ids ¶
Return distinct class IDs present in the detections.
Note: triggers a partial collect on the metadata frame.
at_iou_threshold ¶
Return a copy with is_tp recomputed at a different IoU threshold.
The stored iou column is compared against iou_threshold to set
is_tp without re-running the matching step.
.. warning::
Re-thresholding only works reliably when *raising* the threshold
above the original matching IoU. Lowering it has no effect because
detections that were unmatched at the original threshold have no
stored ``gt_idx``/``iou`` to re-evaluate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
iou_threshold
|
float
|
New IoU threshold to apply. |
required |
Returns:
| Type | Description |
|---|---|
DetectionTable
|
|
to_per_image ¶
Aggregate detections to one row per image with top-scoring detection.
Produces one row per (image_id, class_id) with:
detections: list of detection structs sorted by score descending- compatibility columns
max_scoreandtop_is_tpfrom the highest-scoring detection - metadata columns
gt_label,weight,n_gts
LROC consumes detections for image-level summarization (best
localized detection), rather than relying only on the top-scoring
detection.
image_ids_and_strata ¶
Extract image IDs and optional stratification mapping.
Returns:
| Type | Description |
|---|---|
tuple[list[str], dict[str, str] | None]
|
Tuple of |
collect ¶
Materialize both frames.
Returns:
| Type | Description |
|---|---|
tuple[DataFrame, DataFrame]
|
Tuple of |
MetricResult
dataclass
¶
Base class for all detection metric results.
Subclasses add metric-specific convenience methods with pre-bound column
names (e.g., FROCResult.sensitivity_at_fp).
Attributes:
| Name | Type | Description |
|---|---|---|
curve |
DataFrame
|
DataFrame containing the computed metric curve. |
metadata |
dict[str, Any]
|
Arbitrary metadata about the computation. |
auc ¶
auc(
*,
x_col: str,
y_col: str,
x_range: tuple[float, float] | None = None,
correction: CorrectionMethod = None,
) -> float
Compute (partial) AUC under the curve.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x_col
|
str
|
Column name for the x-axis values. |
required |
y_col
|
str
|
Column name for the y-axis values. |
required |
x_range
|
tuple[float, float] | None
|
Optional |
None
|
correction
|
CorrectionMethod
|
Optional correction for partial AUC.
|
None
|
Returns:
| Type | Description |
|---|---|
float
|
Area under the curve (or partial area). |
interpolate ¶
Linearly interpolate a y-value at a given x-value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x_col
|
str
|
Column name for the x-axis. |
required |
y_col
|
str
|
Column name for the y-axis. |
required |
at
|
float
|
The x-value at which to interpolate. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Interpolated y-value. |
summary_table ¶
Build a summary at specific operating points.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x_col
|
str
|
Column for x-axis values. |
required |
y_col
|
str
|
Column for y-axis values. |
required |
operating_points
|
list[float]
|
x-values at which to report interpolated y. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with |
BootstrapResult
dataclass
¶
Container for bootstrap confidence interval results.
Attributes:
| Name | Type | Description |
|---|---|---|
point_estimate |
float
|
Metric value on the original sample. |
ci_lower |
float
|
Lower confidence bound. |
ci_upper |
float
|
Upper confidence bound. |
confidence |
float
|
Confidence level used for the interval. |
distribution |
list[float]
|
Raw bootstrap metric values. |
Matchers¶
ContourMatcher ¶
Match detections from heatmaps + binary masks via contour extraction.
This is the refactored version of the original prepare_detection_table
from _prepare.py. It extracts contours from both predictions and GT
masks, scores predictions against the heatmap, and runs greedy IoU matching
via .contour.match_detections().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
iou_threshold
|
float
|
IoU threshold for TP matching. |
0.5
|
extraction_threshold
|
float
|
Threshold for contour extraction from heatmaps. |
0.1
|
min_contour_area
|
float
|
Minimum extracted contour area for predictions. |
1.0
|
auto_resize
|
bool
|
Whether to resize heatmaps to mask shapes automatically. |
True
|
gt_min_contour_area
|
float | None
|
Separate min area for GT contours (defaults to
|
None
|
match ¶
match(
data: LazyFrame | DataFrame,
*,
pred_col: str,
gt_col: str,
score_col: str | None = None,
class_col: str | None = None,
image_id_col: str | None = None,
weight_col: str | None = None,
group_col: str | None = None,
) -> DetectionTable
Produce a DetectionTable from heatmap + binary mask data.
Accepts any column format supported by polars-cv sources: nested
List[List[...]], VIEW protocol Binary (blob), or fixed-size
Array[...]. The source format is auto-detected from the column
dtype.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
LazyFrame | DataFrame
|
Input frame with one image/sample per row. |
required |
pred_col
|
str
|
Prediction heatmap column (any supported format). |
required |
gt_col
|
str
|
Ground-truth binary mask column (any supported format). |
required |
score_col
|
str | None
|
Unused for contour matching (scores are derived from heatmap peaks). |
None
|
class_col
|
str | None
|
Optional class label column for multi-class metrics. |
None
|
image_id_col
|
str | None
|
Optional image identifier column (defaults to row index). |
None
|
weight_col
|
str | None
|
Optional sample weight column. |
None
|
group_col
|
str | None
|
Optional grouping column. |
None
|
Returns:
| Type | Description |
|---|---|
DetectionTable
|
Validated |
BBoxMatcher ¶
Match detections from bounding-box lists via IoU matching.
Expects prediction and ground-truth columns as List[Struct{x, y, width,
height}] (i.e. List[BBOX_SCHEMA]). Scores should be provided as a
separate List[Float64] column aligned with the prediction bboxes.
Matching calls the Rust bbox_match_detections plugin function which
internally converts each bbox to a 4-point rectangular contour and
delegates to the existing contour matching infrastructure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
iou_threshold
|
float
|
IoU threshold for TP matching. |
0.5
|
match ¶
match(
data: LazyFrame | DataFrame,
*,
pred_col: str,
gt_col: str,
score_col: str | None = None,
class_col: str | None = None,
image_id_col: str | None = None,
weight_col: str | None = None,
group_col: str | None = None,
) -> DetectionTable
Produce a DetectionTable from bbox prediction/GT lists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
LazyFrame | DataFrame
|
Input frame with one image/sample per row. |
required |
pred_col
|
str
|
Prediction bboxes column ( |
required |
gt_col
|
str
|
Ground-truth bboxes column ( |
required |
score_col
|
str | None
|
Per-prediction score column ( |
None
|
class_col
|
str | None
|
Optional class label column. |
None
|
image_id_col
|
str | None
|
Optional image identifier column. |
None
|
weight_col
|
str | None
|
Optional sample weight column. |
None
|
group_col
|
str | None
|
Optional grouping column. |
None
|
Returns:
| Type | Description |
|---|---|
DetectionTable
|
Validated |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
PreMatchedAdapter ¶
Adapter for data that already has per-detection TP/FP assignments.
Expects a flat table where each row is one detection with at minimum:
score(float) — confidence scoreis_tp(bool) — whether this detection is a true positive
Plus per-image metadata either inline or via n_gts column.
match ¶
match(
data: LazyFrame | DataFrame,
*,
pred_col: str = "score",
gt_col: str = "is_tp",
score_col: str | None = None,
class_col: str | None = None,
image_id_col: str | None = None,
weight_col: str | None = None,
group_col: str | None = None,
n_gts_col: str | None = None,
gt_label_col: str | None = None,
iou_col: str | None = None,
det_idx_col: str | None = None,
) -> DetectionTable
Wrap pre-matched data into a DetectionTable.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
LazyFrame | DataFrame
|
Input frame with one row per detection. |
required |
pred_col
|
str
|
Column with confidence scores (aliased to |
'score'
|
gt_col
|
str
|
Column with TP flag (aliased to |
'is_tp'
|
score_col
|
str | None
|
Alias for |
None
|
class_col
|
str | None
|
Optional class label column. |
None
|
image_id_col
|
str | None
|
Image identifier column (required, or row index used). |
None
|
weight_col
|
str | None
|
Optional sample weight column. |
None
|
group_col
|
str | None
|
Optional grouping column. |
None
|
n_gts_col
|
str | None
|
Column with per-image GT count. |
None
|
gt_label_col
|
str | None
|
Column with per-image positive/negative label. |
None
|
iou_col
|
str | None
|
Optional column with per-detection IoU values. |
None
|
det_idx_col
|
str | None
|
Optional column with detection index within image. |
None
|
Returns:
| Type | Description |
|---|---|
DetectionTable
|
Validated |
Metric Functions¶
precision_recall_curve ¶
precision_recall_curve(
table: DetectionTable, *, class_id: str | None = None
) -> PrecisionRecallResult
Compute a precision-recall curve from a DetectionTable.
Detections are sorted by confidence score (descending). At each rank, cumulative TP/FP are computed and precision/recall derived. All computation uses Polars lazy expressions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
DetectionTable
|
Canonical detection table. |
required |
class_id
|
str | None
|
Restrict to a specific class. |
None
|
Returns:
| Type | Description |
|---|---|
PrecisionRecallResult
|
|
average_precision ¶
average_precision(
table: DetectionTable,
*,
class_id: str | None = None,
interpolation: Literal[
"all_points", "11_point"
] = "all_points",
) -> float
Compute Average Precision for a single class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
DetectionTable
|
Canonical detection table. |
required |
class_id
|
str | None
|
Restrict to a specific class. |
None
|
interpolation
|
Literal['all_points', '11_point']
|
|
'all_points'
|
Returns:
| Type | Description |
|---|---|
float
|
AP value in [0, 1]. |
mean_average_precision ¶
mean_average_precision(
table: DetectionTable,
*,
iou_thresholds: list[float] | None = None,
interpolation: Literal[
"all_points", "11_point"
] = "all_points",
) -> float
Compute Mean Average Precision across classes and IoU thresholds.
If iou_thresholds is provided, the stored iou column is re-thresholded
at each level to recompute is_tp -- no re-matching is needed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
DetectionTable
|
Canonical detection table. |
required |
iou_thresholds
|
list[float] | None
|
IoU thresholds to average over. Defaults to
|
None
|
interpolation
|
Literal['all_points', '11_point']
|
AP interpolation method. |
'all_points'
|
Returns:
| Type | Description |
|---|---|
float
|
mAP value in [0, 1]. |
precision_at_threshold ¶
precision_at_threshold(
table: DetectionTable,
threshold: float,
*,
class_id: str | None = None,
) -> float
Compute precision at a given score threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
DetectionTable
|
Canonical detection table. |
required |
threshold
|
float
|
Score threshold. |
required |
class_id
|
str | None
|
Optional class filter. |
None
|
Returns:
| Type | Description |
|---|---|
float
|
Precision value. |
recall_at_threshold ¶
recall_at_threshold(
table: DetectionTable,
threshold: float,
*,
class_id: str | None = None,
) -> float
Compute recall at a given score threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
DetectionTable
|
Canonical detection table. |
required |
threshold
|
float
|
Score threshold. |
required |
class_id
|
str | None
|
Optional class filter. |
None
|
Returns:
| Type | Description |
|---|---|
float
|
Recall value. |
f1_at_threshold ¶
f1_at_threshold(
table: DetectionTable,
threshold: float,
*,
class_id: str | None = None,
) -> float
Compute F1 score at a given score threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
DetectionTable
|
Canonical detection table. |
required |
threshold
|
float
|
Score threshold. |
required |
class_id
|
str | None
|
Optional class filter. |
None
|
Returns:
| Type | Description |
|---|---|
float
|
F1 value in [0, 1]. |
froc_curve ¶
Compute FROC operating points from a DetectionTable.
Uses a cumulative-sum approach: detections are sorted by score (descending) and TP/FP counts are accumulated, producing one curve point per unique score. This avoids the O(images x thresholds) dense grid that the previous implementation used.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
DetectionTable
|
Canonical detection table produced by a matcher. |
required |
thresholds
|
list[float] | None
|
Optional explicit score thresholds. When provided, the curve is filtered to these thresholds only. |
None
|
Returns:
| Type | Description |
|---|---|
FROCResult
|
|
lroc_curve ¶
lroc_curve(
table: DetectionTable,
*,
variant: Literal["best_tp", "top_scoring"] = "best_tp",
) -> LROCResult
Compute LROC operating points from a DetectionTable.
LROC requires:
- gt_label in image metadata (positive/negative per image).
- Per-image top-detection reduction from DetectionTable.to_per_image().
Two scoring variants are supported:
"best_tp"(default): For each positive image, the effective score is the highest-scoring TP detection. An image counts as localized if it has any TP above the threshold."top_scoring": For each positive image, take the single highest-scoring detection (regardless of TP/FP). The image counts as localized only if that top detection is a TP. This matches the classical single-commitment LROC (Swensson 1996).
For negative images both variants use the maximum detection score.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
DetectionTable
|
Canonical detection table produced by a matcher. |
required |
variant
|
Literal['best_tp', 'top_scoring']
|
|
'best_tp'
|
Returns:
| Type | Description |
|---|---|
LROCResult
|
|
confusion_at_threshold ¶
confusion_at_threshold(
table: DetectionTable,
threshold: float,
*,
class_id: str | None = None,
) -> dict[str, int]
Compute TP, FP, FN counts at a given score threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
DetectionTable
|
Canonical detection table. |
required |
threshold
|
float
|
Score threshold — detections with |
required |
class_id
|
str | None
|
Optional class filter. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, int]
|
Dictionary with keys |
Bootstrap¶
bootstrap_metric_sequential ¶
bootstrap_metric_sequential(
*,
image_ids: list[str],
metric_fn: Callable[[list[str]], float],
point_estimate: float,
n_bootstrap: int = 1000,
confidence: float = 0.95,
seed: int | None = None,
strata: dict[str, str] | None = None,
) -> BootstrapResult
Estimate confidence intervals by image-level bootstrap sampling.
This is the sequential fallback that calls metric_fn once per iteration.
Use :func:bootstrap_pr_auc for a fully vectorized Polars-native path when
computing PR-based AUC.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_ids
|
list[str]
|
Base image IDs to sample with replacement. |
required |
metric_fn
|
Callable[[list[str]], float]
|
Callback computing a scalar metric from sampled image IDs. |
required |
point_estimate
|
float
|
Metric on the original sample. |
required |
n_bootstrap
|
int
|
Number of bootstrap iterations. |
1000
|
confidence
|
float
|
Confidence level in |
0.95
|
seed
|
int | None
|
Optional RNG seed. |
None
|
strata
|
dict[str, str] | None
|
Optional image->stratum mapping for stratified resampling. |
None
|
Returns:
| Type | Description |
|---|---|
BootstrapResult
|
|
bootstrap_pr_auc ¶
bootstrap_pr_auc(
table: DetectionTable,
*,
n_bootstrap: int = 1000,
confidence: float = 0.95,
seed: int | None = None,
class_id: str | None = None,
) -> BootstrapResult
Vectorized bootstrap for precision-recall AUC.
Generates all bootstrap samples as a single DataFrame, joins with the detection table, and computes AP per bootstrap iteration using Polars window functions -- all in one lazy plan.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
DetectionTable
|
Canonical detection table. |
required |
n_bootstrap
|
int
|
Number of bootstrap iterations. |
1000
|
confidence
|
float
|
Confidence level in |
0.95
|
seed
|
int | None
|
Optional RNG seed. |
None
|
class_id
|
str | None
|
Optional class filter. |
None
|
Returns:
| Type | Description |
|---|---|
BootstrapResult
|
|