diff --git a/examples/compact_mask/README.md b/examples/compact_mask/README.md new file mode 100644 index 0000000000..a24b240dc4 --- /dev/null +++ b/examples/compact_mask/README.md @@ -0,0 +1,591 @@ +# CompactMask — Memory-Efficient Mask Storage + +This example benchmarks `CompactMask`, a new mask representation introduced in `supervision` that replaces dense `(N, H, W)` boolean arrays with a crop-scoped Run-Length Encoding (RLE). The benchmark demonstrates full API compatibility, massive memory savings, and order-of-magnitude annotation speedups — with no change to your existing `Detections` code. + +--- + +## The Problem + +Instance segmentation models return one boolean mask per detected object. `supervision` stores these as a stacked `(N, H, W)` numpy array. + +For a 4K image with 1 000 detected objects: + +``` +1 000 x 3840 x 2160 x 1 byte = 8.3 GB +``` + +At this scale, typical pipelines crash with `MemoryError` before a single frame is annotated. Aerial imagery, satellite tiles, and high-density crowd scenes all hit this wall. + +--- + +## The Solution — Crop-RLE Storage + +`CompactMask` stores each mask as a run-length encoding of its **bounding-box crop** rather than the full image canvas. + +``` +dense (N,H,W) mask → N x crop_RLE + N x (x1,y1) offset +8.3 GB → ~280 KB +``` + +The bounding boxes are already present in `Detections.xyxy`, so no extra metadata is required from the caller. + +### Theoretical analysis (4K scene, 80x80 px objects, ~65% fill per bbox) + +Assumptions used throughout the PR design analysis: + +| Parameter | Value | +| ---------------------- | ------------------------ | +| Image size | 4K — 3840x2160 = 8.29 MP | +| Avg bounding box | 80x80 px = 6 400 px² | +| Fill ratio within bbox | ~65% | +| Avg contour vertices | ~400 pts | +| Avg RLE runs / mask | ~240 (3 runs x 80 rows) | + +#### Space comparison + +| Format | Per object | N=100 | N=1 000 | vs Dense | +| ------------------- | -------------- | ------ | ---------- | --------- | +| **Dense** (current) | 8.29 MB | 829 MB | **8.3 GB** | 1x | +| Local Crop + Offset | 6.4 KB | 640 KB | 6.4 MB | 1 300x | +| **Crop-RLE** ✓ | ~2 KB | 200 KB | **2 MB** | 4 000x | +| Polygon ⚠ lossy | ~3.2 KB | 320 KB | 3.2 MB | 2 600x | +| memmap | 8.29 MB (disk) | 829 MB | 8.3 GB | 1x (disk) | + +Crop-RLE beats Local Crop because it only encodes actual pixel runs, skipping the ~35% background pixels within each bounding box. + +#### Encode time: dense array → format + +| Format | Complexity | N=10 | N=100 | N=1 000 | +| ------------------- | --------------------------------- | ------- | ------- | --------- | +| Local Crop + Offset | O(A) — strided slice from xyxy | ~0.1 ms | ~1 ms | ~10 ms | +| **Crop RLE** | O(A) — scan crop rows for runs | ~0.2 ms | ~2 ms | ~20 ms | +| Polygon | O(P) — `cv2.findContours` on crop | ~2 ms | ~20 ms | ~200 ms | +| memmap | O(I) — write 8.29 MB to disk | ~80 ms | ~800 ms | ~8 000 ms | + +#### Decode time: format → full (H, W) mask + +Required by `MaskAnnotator`, `mask_iou_batch`, `merge()`, etc. Dominant cost at 4K is **allocating and zeroing a 8.29 MB array**, which is identical across all in-memory formats once full materialisation is needed. + +| Format | N=10 | N=100 | N=1 000 | +| --------------------- | ------ | ------- | --------- | +| Local Crop / Crop RLE | ~3 ms | ~30 ms | ~300 ms | +| Polygon | ~5 ms | ~50 ms | ~500 ms | +| memmap | ~80 ms | ~800 ms | ~8 000 ms | + +#### Decode time: crop-only path (optimised) + +When callers need only the bounding-box region — `MaskAnnotator` crop-paint path, `.area`, `contains_holes`, `filter_segments_by_distance`: + +| Format | Complexity | N=10 | N=100 | N=1 000 | +| ------------------- | -------------------------------- | -------- | ------- | --------- | +| Local Crop + Offset | O(1) — already stored | ~0 ms | ~0 ms | ~0 ms | +| **Crop RLE** ✓ | O(A) — expand ~240 runs | ~0.02 ms | ~0.2 ms | ~2 ms | +| Polygon | O(A) — `fillPoly` on crop canvas | ~2 ms | ~20 ms | ~200 ms | +| memmap | N/A — always full-size | ~80 ms | ~800 ms | ~8 000 ms | + +Crop RLE's `.crop()` method powers the `MaskAnnotator` optimisation — it never allocates the full image canvas, which is the entire source of the annotation speedup. + +#### IoU / NMS at 1 % bbox overlap rate (sparse aerial scene) + +| Format | Strategy | N=1 000 | +| ------------------- | ------------------------------------- | ---------- | +| Dense (current) | All pairs, 640² pixel AND | ~10 000 ms | +| Local Crop + Offset | Bbox pre-filter → pixel IoU | **~5 ms** | +| Crop RLE | Bbox pre-filter → expand intersection | **~15 ms** | + +At N=1 000 with 1 % overlap, bbox pre-filter reduces 499 500 candidate pairs to ~5 000 overlapping pairs — a ~2 000x reduction in pixel-level work. + +--- + +## Why Crop-RLE Was Chosen over Local Crop + +Both formats compress extremely well; the deciding factors for Crop-RLE are: + +1. **~3x smaller** for masks that are themselves sparse within their bounding box. +2. **COCO RLE interop path** — row-major crop RLE can be re-encoded to column-major full-image RLE for `pycocotools` if needed. +3. `.area` computed directly from run lengths — no materialisation, no allocation. + +The main trade-off: crop-only decode is O(A) rather than O(1). For the common solid-fill segmentation mask this is negligible (\<0.1 ms per mask). + +--- + +## Operation-by-Operation Speedup Analysis + +This section walks through every `Detections` operation that touches masks and shows exactly why `CompactMask` is faster. All code snippets are taken from the actual implementation. Numbers use the **FHD-200-50%-v600** scenario unless noted (1920 x 1080 image, 200 detections, each mask filling ~50% of the frame, 600-vertex polygons — a realistic hard case with dense fill and complex object boundaries). + +At 50% fill on an FHD image each mask's bounding box covers a large portion of the frame, producing many RLE runs per row. + +--- + +### Memory + +Dense stores one full-resolution bool array per mask: + +``` +N x H x W x 1 byte +200 x 1080 x 1920 x 1 = 414 MB +``` + +Compact stores three lightweight structures: + +```python +self._rles: list[npt.NDArray[np.int32]] # N Python references to small int32 arrays +self._crop_shapes: npt.NDArray[np.int32] # (N, 2) — crop (h, w) per mask +self._offsets: npt.NDArray[np.int32] # (N, 2) — (x1, y1) origin per mask +``` + +Per-mask RLE size at 50% fill with 600-vertex polygons: ~4.7 KB (933 KB / 200). Per-mask dense size: 1920 x 1080 x 1 = 2.1 MB. Per-mask ratio: 2.1 MB / 4.7 KB = **~445x**. + +Scaled to N=200: 200 x 4.7 KB = ~933 KB of RLE data, plus `_crop_shapes` (1.6 KB) and `_offsets` (1.6 KB). Python list + array object overhead roughly doubles the footprint for small N. + +| Component | Dense | Compact | Ratio | +| --------------- | ---------- | ----------- | --------- | +| Mask data | 414 MB | ~933 KB | ~445x | +| Python overhead | negligible | ~933 KB | -- | +| **Total** | **414 MB** | **~1.9 MB** | **~392x** | + +At 5% fill with 8-vertex polygons, the ratio reaches 10 000x–20 000x because crops are tiny and RLEs are extremely short. The benchmark's 4K-200-5%-v8 scenario measures 21 786x (theory) / ~6 000x (malloc). The SAT-200-5%-v8 scenario reaches 62 968x theoretical. + +--- + +### `.area` + +Dense `Detections.area` reads every pixel of every mask: + +```python +# detection/core.py — dense path +return np.array([np.sum(mask) for mask in self.mask]) +# N masks x H x W boolean sums = 200 x 2.1 M = 420 million reads +``` + +Compact delegates to `_rle_area`, which sums only the odd-indexed run lengths (the True-pixel runs) in each RLE: + +```python +# detection/compact_mask.py — _rle_area +return int(np.sum(rle[1::2])) +``` + +```python +# detection/compact_mask.py — CompactMask.area +return np.array([_rle_area(r) for r in self._rles], dtype=np.int64) +``` + +At FHD-200-50%-v600, dense `.area` takes 84.66 ms; compact takes 0.48 ms — a **71x speedup**. At SAT-200-20%-v128 the measured speedup reaches **1 204x** because the dense array is 13.4 GB and each sum must scan the entire canvas. + +| Factor | Reduction | +| ---------------------------------- | ----------- | +| RLE sums vs full-frame pixel reads | ~4 600x | +| int32 arithmetic vs bool reduction | ~2x | +| No (H, W) allocation per mask | latency | +| **Combined** | **~1 000x** | + +--- + +### `filter` / `__getitem__` (boolean index) + +Dense: `masks[bool_array]` triggers NumPy fancy indexing, which allocates a new `(K, H, W)` bool array and copies K full frames: + +```python +# detection/core.py — Detections.__getitem__ +mask = (self.mask[index] if self.mask is not None else None,) +# For dense ndarray, numpy allocates (K, 2160, 3840) and memcpy's K frames +``` + +Compact `CompactMask.__getitem__` converts the boolean index to integer positions and builds a new `CompactMask` from Python list indexing and NumPy fancy indexing on small `(N, 2)` arrays: + +```python +# detection/compact_mask.py — CompactMask.__getitem__ +if isinstance(index, np.ndarray) and index.dtype == bool: + idx_arr = np.where(index)[0] +# ... +new_rles = [self._rles[int(i)] for i in idx_arr] +new_crop_shapes: npt.NDArray[np.int32] = self._crop_shapes[idx_arr] +new_offsets: npt.NDArray[np.int32] = self._offsets[idx_arr] +return CompactMask(new_rles, new_crop_shapes, new_offsets, self._image_shape) +``` + +At FHD-200-50%-v600, dense `filter` takes 14.56 ms; compact takes 0.03 ms — a **500x speedup**. At SAT-200-20%-v128 the speedup reaches **14 757x**. + +| | Dense | Compact | +| ----------- | ----------------------- | ----------------------------------- | +| Data copied | K x H x W (full frames) | K Python references + K x 8 bytes | +| Allocation | new `(K, H, W)` array | new `CompactMask` shell (~trivial) | +| **Speedup** | | **hundreds to tens of thousands x** | + +--- + +### `annotate` (`MaskAnnotator`) + +Dense: for each mask, `MaskAnnotator` indexes the full `(H, W)` array and applies a boolean mask across the entire scene: + +```python +# annotators/core.py — dense path +mask = np.asarray(detections.mask[detection_idx], dtype=bool) +colored_mask[mask] = color.as_bgr() +``` + +Each `detections.mask[detection_idx]` for a dense array yields a full `(H, W)` view, and the boolean indexing scans all pixels. + +Compact: the annotator detects `CompactMask` and paints only the crop region: + +```python +# annotators/core.py — compact path +x1 = int(compact_mask.offsets[detection_idx, 0]) +y1 = int(compact_mask.offsets[detection_idx, 1]) +crop_m = compact_mask.crop(detection_idx) +crop_h, crop_w = crop_m.shape +colored_mask[y1 : y1 + crop_h, x1 : x1 + crop_w][crop_m] = color.as_bgr() +``` + +`compact_mask.crop()` decodes the RLE into a `(crop_h, crop_w)` array. At FHD-200-50%-v600, dense `annotate` takes 848.95 ms; compact takes 32.67 ms — a **22x speedup**. At SAT-200-20%-v128 the speedup reaches **89x**. + +| Factor | Reduction | +| -------------------------------------------------- | ------------------- | +| Crop decode vs full-frame boolean index (per mask) | crop-size dependent | +| No full `(H, W)` allocation per integer index | latency | +| x N masks | compounds | +| **Combined** | **~26 – 400x** | + +--- + +### IoU (`mask_iou_batch` / `compact_mask_iou_batch`) + +Dense `mask_iou_batch` on N=200, FHD: + +```python +# detection/utils/iou_and_nms.py — _mask_iou_batch_split +intersection_area = np.logical_and(masks_true[:, None], masks_detection).sum( + axis=(2, 3) +) +# shape (200, 200, 1080, 1920) — ~80 billion boolean ops +# .sum(axis=(2,3)) for intersection counts +# memory_limit splits this into chunks capped at 5 GB scratch +``` + +Compact `compact_mask_iou_batch` — three layered optimisations: + +**1. Vectorised bbox pre-filter — O(N²) array ops, zero decoding** + +```python +ix1: npt.NDArray[np.int32] = np.maximum(x1a[:, None], x1b[None, :]) +iy1: npt.NDArray[np.int32] = np.maximum(y1a[:, None], y1b[None, :]) +ix2: npt.NDArray[np.int32] = np.minimum(x2a[:, None], x2b[None, :]) +iy2: npt.NDArray[np.int32] = np.minimum(y2a[:, None], y2b[None, :]) +bbox_overlap: npt.NDArray[np.bool_] = (ix1 <= ix2) & (iy1 <= iy2) +``` + +At 5% fill, two random masks overlap with probability ~4%. ~96% of the N² pairs get IoU = 0 for free — no pixel work at all. + +**2. Sub-crop decode — compare only the intersection region** + +```python +ox_a, oy_a = int(x1a[i]), int(y1a[i]) +sub_a = crops_a[i][ly1 - oy_a : ly2 - oy_a + 1, lx1 - ox_a : lx2 - ox_a + 1] + +ox_b, oy_b = int(x1b[j]), int(y1b[j]) +sub_b = crops_b[j][ly1 - oy_b : ly2 - oy_b + 1, lx1 - ox_b : lx2 - ox_b + 1] + +inter = int(np.logical_and(sub_a, sub_b).sum()) +``` + +The intersection sub-region of two overlapping crops is typically far smaller than the full frame. + +**3. Crop caching — each mask decoded at most once** + +```python +if i not in crops_a: + crops_a[i] = masks_true.crop(i) +``` + +Area is obtained from `_rle_area` (sum odd-indexed runs), never touching the pixel grid: + +```python +areas_a: npt.NDArray[np.int64] = masks_true.area +``` + +At FHD-200-50%-v600, dense IoU takes 23 915 ms; compact takes 51.58 ms — a **446x speedup**. At 5% fill / sparse scenarios the speedup is even larger because fewer bbox pairs overlap. + +| Factor | Reduction | +| ------------------------------------ | --------------- | +| Bbox pre-filter at sparse fill | 25x | +| Sub-crop vs full frame per pair | ~200x | +| Area from RLE, not `sum(axis=(1,2))` | ~10x | +| No 5 GB scratch allocation | latency | +| **Combined** | **~100 – 500x** | + +At 20% fill the gaps close — more pairs overlap, larger crops — speedup drops toward the lower end of the range. + +--- + +### NMS (`mask_non_max_suppression`) + +Both dense and compact paths now call `mask_iou_batch(masks, masks)` directly, computing exact mask IoU on the original (unresized) masks. There is no intermediate resize step. + +```python +# detection/utils/iou_and_nms.py — NMS (both paths) +ious = mask_iou_batch(masks, masks, overlap_metric) +``` + +`mask_iou_batch` dispatches internally: when passed a `CompactMask` it calls `compact_mask_iou_batch`, applying all three IoU optimisations (bbox pre-filter, sub-crop decode, crop caching). When passed a dense ndarray it runs the chunked pixel-AND path. + +All three IoU optimisations apply to the compact path: + +| Factor | Reduction | +| ------------------------------------- | ---------------------------- | +| Bbox pre-filter eliminates most pairs | 25x at sparse fill | +| Sub-crop decode for remaining pairs | ~200x | +| Area from RLE, not pixel sum | ~10x | +| **Combined** | **same as IoU: ~100 – 500x** | + +At FHD-200-50%-v600, dense NMS takes 5 231 ms; compact takes 48.15 ms — a **481x speedup**. Dense IoU/NMS is skipped for scenarios above 1 GB (4K-200 and SAT-200 tiers); compact NMS still runs on those. + +--- + +### `merge` (`Detections.merge`) + +Dense: `np.vstack` allocates a new `(N1+N2, H, W)` array and copies both halves: + +```python +# detection/core.py — dense merge path +return np.vstack([np.asarray(m) for m in masks]) +# Merging two 100-mask sets at FHD: 2 x 100 x 2.1 MB = 414 MB copied +``` + +Compact: `CompactMask.merge` extends a Python list and concatenates two small int32 arrays: + +```python +# detection/compact_mask.py — CompactMask.merge +new_rles: list[npt.NDArray[np.int32]] = [] +for m in masks_list: + new_rles.extend(m._rles) + +new_crop_shapes: npt.NDArray[np.int32] = np.concatenate( + [m._crop_shapes for m in masks_list], axis=0 +) +new_offsets: npt.NDArray[np.int32] = np.concatenate( + [m._offsets for m in masks_list], axis=0 +) +``` + +`list.extend` copies N reference pointers. `np.concatenate` on `(N, 2)` int32 arrays copies N x 8 bytes per array. + +At FHD-200-50%-v600, dense merge takes 29.71 ms; compact takes 0.03 ms — a **929x speedup**. At SAT-200-20%-v128 the speedup reaches **89 046x**. + +| | Dense | Compact | +| ----------- | ----------------------- | -------------------------- | +| Data moved | N x H x W (full frames) | N references + N x 8 bytes | +| Allocation | new `(N, H, W)` array | new `CompactMask` shell | +| **Speedup** | | **effectively free** | + +**Note:** `Detections.merge` calls `is_empty()` on each input. Before the `len(xyxy) > 0` short-circuit was added, `is_empty()` invoked `__eq__` which called `np.array_equal(self.to_dense(), ...)` — materialising the entire `(N, H, W)` CompactMask to dense just to check emptiness. The fix: + +```python +# detection/core.py — Detections.is_empty (fixed) +if len(self.xyxy) > 0: + return False +``` + +This O(1) check avoids the O(N x H x W) dense materialisation that previously dominated compact merge time. + +--- + +### `offset` / `with_offset` (`InferenceSlicer` tile stitching) + +Dense `move_masks`: allocates a new `(N, new_H, new_W)` array and copies each mask with shifted slice coordinates — O(N x H x W): + +```python +# detection/utils/masks.py — move_masks +mask_array = np.full((masks.shape[0], resolution_wh[1], resolution_wh[0]), False) +# ... source/destination slicing logic ... +mask_array[:, dst_y1:dst_y2, dst_x1:dst_x2] = masks[:, src_y1:src_y2, src_x1:src_x2] +``` + +Compact `with_offset(dx, dy)`: vectorised bounds check first. All new bounding-box positions are computed in a single numpy op. When none overflow the new canvas — the common case in `InferenceSlicer` — the RLE data is not touched at all: + +```python +# detection/compact_mask.py — CompactMask.with_offset (fast path) +new_offsets = self._offsets + np.array([dx, dy], dtype=np.int32) # O(N) numpy +needs_clip = (x1s < 0) | (y1s < 0) | (x2s >= new_w) | (y2s >= new_h) +if not needs_clip.any(): + return CompactMask( + list(self._rles), self._crop_shapes.copy(), new_offsets, new_image_shape + ) +``` + +When a crop does overflow (e.g. object at a tile edge), only that crop is decoded, sliced, and re-encoded. Masks fully outside bounds get a 1x1 all-False stub without any decoding. + +At FHD-200-50%-v600, dense offset takes 42.30 ms; compact takes 0.02 ms — a **2 016x speedup**. At SAT-200-20%-v128 the speedup reaches **290 779x**. + +| | Dense | Compact (no-clip fast path) | +| ----------------- | -------------------------------------- | ------------------------------------ | +| Work per mask | allocate `(new_H, new_W)` + copy H x W | add scalar to offset row — O(1) | +| N=200 at FHD | 200 x 2.1 MB = **414 MB** alloc + copy | two numpy ops on `(N, 2)` int32 | +| Output allocation | new `(N, new_H, new_W)` | shared RLE list + new `(N, 2)` array | +| **Speedup** | | **effectively free (>1 000x)** | + +In the `InferenceSlicer` pipeline the canvas is always expanded by the tile offset, so no crop ever overflows — the fast path is always taken. Clipping only activates for objects that genuinely straddle the image boundary. + +--- + +### `centroids` (`calculate_masks_centroids`) + +Dense: `np.tensordot` reads every pixel of every mask to compute weighted coordinate sums: + +```python +# detection/utils/masks.py — dense centroid path +vertical_indices, horizontal_indices = np.indices((height, width)) + 0.5 +# np.tensordot(masks, indices, axes=([1, 2], [0, 1])) +# reads all N x H x W values +``` + +Compact: per-crop loop decodes only the bounding-box region and computes centroids within that crop: + +```python +# detection/utils/masks.py — compact centroid path +crop = masks.crop(i) +crop_h, crop_w = crop.shape +x1 = int(masks.offsets[i, 0]) +y1 = int(masks.offsets[i, 1]) +# ... +crop_rows, crop_cols = np.indices((crop_h, crop_w)) +cx = float(np.sum((crop_cols + 0.5)[crop])) / total + x1 +cy = float(np.sum((crop_rows + 0.5)[crop])) / total + y1 +``` + +At FHD-200-50%-v600, dense centroids takes 1 133.68 ms; compact takes 60.39 ms — a **13x speedup**. At SAT-200-20%-v128 the speedup reaches **857x** because the dense path must allocate and scan a 13.4 GB array. + +| Factor | Reduction | +| ----------------------------------------- | ------------------- | +| Crop area vs full frame (per mask) | fill-dependent | +| No global `np.indices((H, W))` allocation | saves large float64 | +| **Combined (N=200)** | **~19 – 1 000x** | + +--- + +### Summary + +Measured speedups at the **FHD-200-50%-v600** operating point (dense fill, complex polygons — a realistic hard case). Dense baseline = 1x. + +| Operation | Dense cost | Compact cost | Speedup | +| ---------------- | ----------- | ------------ | ------- | +| Memory | 414 MB | ~1.9 MB | ~392x | +| `.area` | 84.66 ms | 0.48 ms | 71x | +| `filter` | 14.56 ms | 0.03 ms | 500x | +| `annotate` | 848.95 ms | 32.67 ms | 22x | +| `mask_iou_batch` | 23 915 ms | 51.58 ms | 446x | +| NMS | 5 231 ms | 48.15 ms | 481x | +| `merge` | 29.71 ms | 0.03 ms | 929x | +| `with_offset` | 42.30 ms | 0.02 ms | 2 016x | +| `centroids` | 1 133.68 ms | 60.39 ms | 13x | + +All speedups are larger at sparser fill fractions and larger resolutions. At SAT-200-20%-v128, `.area` reaches 1 204x and `merge` reaches 89 046x. At the sparsest scenarios (5% fill, 8-vertex polygons), memory ratios exceed 60 000x. + +--- + +## Drop-In Compatibility + +`CompactMask` implements the same duck-typed interface as `np.ndarray`: + +```python +import supervision as sv +from supervision.detection.compact_mask import CompactMask + +# Build from an existing dense (N, H, W) bool array: +compact = CompactMask.from_dense(masks_dense, xyxy, image_shape=(H, W)) + +# Use exactly like a dense mask — no other code changes needed: +detections = sv.Detections(xyxy=xyxy, mask=compact, class_id=class_ids) + +# Filtering, merging, area — all work transparently: +filtered = detections[confidence > 0.5] +areas = detections.area # RLE sum, no materialisation +merged = sv.Detections.merge([det_a, det_b]) + +# MaskAnnotator works without any change: +annotated = sv.MaskAnnotator().annotate(frame, detections) + +# Materialise back to dense when you need raw numpy: +dense_again = compact.to_dense() # (N, H, W) bool +``` + +Supported indexing patterns: + +| Expression | Returns | +| ------------------ | ---------------------------- | +| `mask[i]` (int) | Dense `(H, W)` bool array | +| `mask[bool_array]` | New `CompactMask` (filtered) | +| `mask[slice]` | New `CompactMask` | +| `np.asarray(mask)` | Dense `(N, H, W)` bool array | + +--- + +## Benchmark + +Run on any machine — no GPU or real model required: + +```bash +uv run python examples/compact_mask/benchmark.py +``` + +Six image tiers x three fill fractions (5 / 20 / 50 %) x three vertex counts (8 / 128 / 600): + +| Tier | Resolution | Objects | Dense array | Notes | +| ------- | ---------- | ------- | ----------- | ------------------------------------ | +| FHD-100 | 1920x1080 | 100 | 0.21 GB | Full operations including IoU+NMS | +| FHD-200 | 1920x1080 | 200 | 0.41 GB | Full operations including IoU+NMS | +| FHD-400 | 1920x1080 | 400 | 0.83 GB | Full operations including IoU+NMS | +| 4K-100 | 3840x2160 | 100 | 0.83 GB | Full operations including IoU+NMS | +| 4K-200 | 3840x2160 | 200 | 1.66 GB | Dense IoU+NMS skipped (array > 1 GB) | +| SAT-200 | 8192x8192 | 200 | 13.4 GB | Dense IoU+NMS skipped (array > 1 GB) | + +Dense timing is skipped automatically when the dense IoU/NMS array would exceed 1 GB (`IOU_DENSE_SKIP_GB`), preventing swap thrashing. All dense ops are skipped above 16 GB (`DENSE_SKIP_GB`); no scenario in the current matrix reaches that threshold. Memory is always reported as theoretical `NxHxW` bytes. + +### Sample results (macOS, Apple M4 Max, REPS=4) + +| Scenario | Dense mem | Compact theor. | Mem x | Area x | Filter x | Annot x | IoU x | NMS x | Merge x | Offset x | Centroids x | +| ---------------- | --------- | -------------- | ------- | ------ | -------- | ------- | ----- | ----- | -------- | -------- | ----------- | +| FHD-100-5%-v8 | 207 MB | 28 KB | 7 418x | — | — | — | — | — | — | — | — | +| FHD-100-50%-v600 | 207 MB | 913 KB | 227x | — | — | — | — | — | — | — | — | +| FHD-200-50%-v600 | 415 MB | 933 KB | 445x | 71x | 500x | 22x | 446x | 481x | 929x | 2 016x | 13x | +| FHD-400-5%-v8 | 829 MB | 60 KB | 13 937x | — | — | — | — | — | — | — | — | +| 4K-100-5%-v8 | 829 MB | 53 KB | 15 554x | — | — | — | — | — | — | — | — | +| 4K-100-20%-v128 | 829 MB | 586 KB | 1 415x | — | — | — | — | — | — | — | — | +| 4K-200-5%-v8 | 1 659 MB | 76 KB | 21 786x | — | — | — | — | — | — | — | — | +| SAT-200-5%-v8 | 13 422 MB | 213 KB | 62 968x | 6 942x | 30 255x | 204x | † | † | 105 545x | 251 629x | 2 173x | +| SAT-200-20%-v128 | 13 422 MB | 2 596 KB | 5 171x | 1 204x | 14 757x | 89x | † | † | 89 046x | 290 779x | 857x | +| SAT-200-50%-v600 | 13 422 MB | 14 222 KB | 944x | — | — | — | † | † | — | — | — | + +- **Compact theor.** — sum of internal numpy buffer `nbytes` +- **Mem x** — dense / compact theoretical ratio +- **Area x / Filter x / Annot x / IoU x / NMS x / Merge x / Offset x / Centroids x** — compact speedup over dense for each operation +- **†** — dense IoU+NMS skipped (dense array > 1 GB); compact still runs and is timed +- **—** — not shown; full per-scenario tables are printed by the benchmark script + +All non-skipped scenarios pass: pixel-perfect annotation, exact area, lossless `to_dense()` roundtrip. + +--- + +## Use-Cases + +- **Aerial / satellite imagery** — thousands of small objects on large tiles; dense masks exhaust RAM before inference completes. +- **High-density crowd / cell segmentation** — N > 500 on FHD already requires several GB of mask storage per batch. +- **Real-time annotation pipelines** — crop-paint cuts annotation from seconds to milliseconds at 4K resolution. +- **Long-running tracking** — accumulated `Detections` across many frames stay in kilobytes rather than gigabytes. +- **`InferenceSlicer`** — `with_offset()` adjusts crop origins directly when stitching tile results; no dense materialisation needed. + +--- + +## Limitations + +- `CompactMask` is **not** a full `np.ndarray`. Call `.to_dense()` before passing to code that requires arbitrary ndarray methods (`astype`, `reshape`, `ravel`, `any`, `all`, …). +- RLE format is **row-major (C-order), crop-scoped** — incompatible with pycocotools / COCO API RLEs (column-major, full-image-scoped). Use `.to_dense()` first if you need pycocotools interop. +- `from_dense()` requires the input `(N, H, W)` array to fit in memory. For truly OOM-scale data, build `CompactMask` per-detection directly from model output crops rather than from a pre-allocated dense stack. + +--- + +## Files + +| File | Description | +| -------------- | ------------------------------------------------ | +| `benchmark.py` | Full benchmark across FHD / 4K / satellite tiers | +| `README.md` | This file | diff --git a/examples/compact_mask/benchmark.py b/examples/compact_mask/benchmark.py new file mode 100644 index 0000000000..7ef5b1372b --- /dev/null +++ b/examples/compact_mask/benchmark.py @@ -0,0 +1,1097 @@ +"""CompactMask demo & benchmark. + +Demonstrates that ``CompactMask`` is a drop-in replacement for dense +``(N, H, W)`` bool arrays in ``supervision.Detections``, while using +significantly less memory and enabling faster annotation. + +Run with: + uv run python examples/compact_mask/benchmark.py + +No GPU or real model is required — everything is synthesized with NumPy. +Mask complexity is controlled by ``num_vertices``: random polygons with more +vertices produce jaggier boundaries and more RLE runs per row. +""" + +from __future__ import annotations + +import dataclasses +import gc +import json +import math +import time +import tracemalloc +from concurrent.futures import ThreadPoolExecutor +from dataclasses import dataclass, field +from datetime import datetime, timezone +from pathlib import Path +from typing import Callable + +import cv2 +import numpy as np +import pandas as pd +from rich import box +from rich.console import Console +from rich.progress import ( + BarColumn, + MofNCompleteColumn, + Progress, + TaskProgressColumn, + TextColumn, + TimeElapsedColumn, +) +from rich.table import Table + +import supervision as sv +from supervision.detection.compact_mask import CompactMask + +console = Console(width=240, force_terminal=True) + +REPETITIONS = 4 +# How many reps to run concurrently in time_reps. Each thread times itself +# independently; results are averaged. Numpy releases the GIL for its C-level +# work so threads can truly run in parallel on multi-core machines. +# Set to 1 to disable parallelism and revert to a sequential timing loop. +PARALLEL = 3 +# Dense timing is skipped when the dense (N,H,W) array would exceed this +# threshold — avoids OOM / swap thrashing on extreme scenarios while still +# reporting the theoretical memory footprint. +DENSE_SKIP_GB = 16.0 +# Dense IoU *and NMS* timing are skipped above this threshold: pairwise +# (N,H,W) AND is extremely expensive — NMS calls IoU internally so both are +# gated by the same threshold. +IOU_DENSE_SKIP_GB = 1.0 +# Reps for dense IoU/NMS — a single pass already takes several seconds. +IOU_NMS_REPS = 2 + + +# ══════════════════════════════════════════════════════════════════════════════ +# Result container +# ══════════════════════════════════════════════════════════════════════════════ + + +@dataclass +class ScenarioResult: + name: str + resolution: str # e.g. "1920x1080" + num_objects: int + fill_name: str # e.g. "5%" + num_vertices: int # polygon vertex count — complexity proxy + # memory (theoretical: raw numpy nbytes) + dense_bytes: int + compact_bytes_theoretical: int + # memory (actual: tracemalloc peak; dense_bytes_actual=0 when dense_skipped=True) + dense_bytes_actual: int + compact_bytes_actual: int + # compactness overhead — absolute times for conversion (always measured) + encode_s: float # CompactMask.from_dense() dense → compact + decode_s: float # compact_mask.to_dense() compact → dense + # timing (nan when dense_skipped=True) + dense_area_s: float + compact_area_s: float + dense_filter_s: float + compact_filter_s: float + dense_annot_s: float + compact_annot_s: float + # pipeline stages (nan when respective skip flag is True) + dense_iou_s: float # nan when iou_dense_skipped + compact_iou_s: float + dense_nms_s: float # nan when dense_skipped + compact_nms_s: float + dense_merge_s: float # nan when dense_skipped + compact_merge_s: float + dense_offset_s: float # nan when dense_skipped + compact_offset_s: float + dense_centroids_s: float # nan when dense_skipped + compact_centroids_s: float + # correctness (None when the stage was skipped) + pixel_perfect: bool | None + areas_match: bool | None + roundtrip_ok: bool | None + iou_ok: bool | None + nms_ok: bool | None + nms_mismatch_count: ( + int # detections with different NMS decisions (0 when dense_skipped) + ) + merge_ok: bool | None + offset_ok: bool | None + centroids_ok: bool | None + # skip flags + dense_skipped: bool = field(default=False) + iou_dense_skipped: bool = field(default=False) + + +# ══════════════════════════════════════════════════════════════════════════════ +# Synthetic data helpers +# ══════════════════════════════════════════════════════════════════════════════ + + +def make_scene(image_height: int, image_width: int) -> np.ndarray: + """Random BGR image.""" + return np.random.default_rng(42).integers( + 0, 255, (image_height, image_width, 3), dtype=np.uint8 + ) + + +def _make_polygon_mask( + image_height: int, + image_width: int, + center_x: int, + center_y: int, + axis_x: int, + axis_y: int, + rng: np.random.Generator, + num_vertices: int, +) -> np.ndarray: + """Random polygon mask. + + *num_vertices* is a direct complexity proxy: more vertices → more + independent radius samples → jaggier boundary → more RLE runs per row. + No smoothing is applied so the relationship is monotone. + """ + angles = np.sort(rng.uniform(0, 2 * np.pi, num_vertices)) + radii = rng.uniform(0.3, 1.0, num_vertices) + pts_x = np.clip( + (center_x + axis_x * radii * np.cos(angles)).astype(np.int32), + 0, + image_width - 1, + ) + pts_y = np.clip( + (center_y + axis_y * radii * np.sin(angles)).astype(np.int32), + 0, + image_height - 1, + ) + pts = np.column_stack([pts_x, pts_y]).reshape(-1, 1, 2) + canvas = np.zeros((image_height, image_width), dtype=np.uint8) + cv2.fillPoly(canvas, [pts], 1) + return canvas.astype(bool) + + +def make_detections( + num_objects: int, + image_height: int, + image_width: int, + fill_fraction: float, + num_vertices: int = 20, + seed: int = 0, +) -> tuple[np.ndarray, np.ndarray, np.ndarray]: + """Return ``(xyxy, masks_dense, class_ids)`` with random polygon masks. + + *num_vertices* controls mask complexity: more vertices → jaggier boundary. + """ + rng = np.random.default_rng(seed) + half = max( + 2, + int( + (image_height * image_width * fill_fraction / (np.pi * num_objects)) ** 0.5 + ), + ) + xyxy_list = [] + masks = np.zeros((num_objects, image_height, image_width), dtype=bool) + for index in range(num_objects): + center_x = int(rng.integers(half + 1, image_width - half - 1)) + center_y = int(rng.integers(half + 1, image_height - half - 1)) + axis_x = int(rng.integers(max(2, half // 2), half * 2 + 1)) + axis_y = int(rng.integers(max(2, half // 2), half * 2 + 1)) + masks[index] = _make_polygon_mask( + image_height, + image_width, + center_x, + center_y, + axis_x, + axis_y, + rng, + num_vertices, + ) + xyxy_list.append( + [ + max(0, center_x - axis_x), + max(0, center_y - axis_y), + min(image_width - 1, center_x + axis_x), + min(image_height - 1, center_y + axis_y), + ] + ) + xyxy = np.array(xyxy_list, dtype=np.float32) + class_ids = rng.integers(0, 10, num_objects, dtype=int) + return xyxy, masks, class_ids + + +# ══════════════════════════════════════════════════════════════════════════════ +# Memory helpers +# ══════════════════════════════════════════════════════════════════════════════ + + +def dense_memory_bytes(masks: np.ndarray) -> int: + """Theoretical dense footprint: raw numpy buffer size.""" + return int(masks.nbytes) + + +def compact_memory_bytes_theoretical(compact_mask: CompactMask) -> int: + """Theoretical compact footprint: sum of all internal numpy buffer sizes.""" + return int( + compact_mask._crop_shapes.nbytes + + compact_mask._offsets.nbytes + + sum(rle.nbytes for rle in compact_mask._rles), + ) + + +def measure_peak_bytes(func: Callable[[], object]) -> int: + """Wrapper that runs *func* under tracemalloc and returns peak allocation. + + tracemalloc captures every Python-level allocation — numpy buffers, list + nodes, object headers — giving the true heap cost of anything *func* + builds. The return value of *func* is discarded so the object does not + stay alive. + """ + tracemalloc.start() + tracemalloc.clear_traces() + func() + _, peak = tracemalloc.get_traced_memory() + tracemalloc.stop() + return int(peak) + + +def dense_memory_bytes_actual( + num_objects: int, image_height: int, image_width: int +) -> int: + """Actual dense footprint: peak bytes during (N, H, W) bool array alloc.""" + return measure_peak_bytes( + lambda: np.zeros((num_objects, image_height, image_width), dtype=bool), + ) + + +def compact_memory_bytes_actual( + masks_dense: np.ndarray, + xyxy: np.ndarray, + image_shape: tuple[int, int], +) -> int: + """Actual compact footprint: peak bytes during CompactMask.from_dense().""" + return measure_peak_bytes( + lambda: CompactMask.from_dense(masks_dense, xyxy, image_shape=image_shape), + ) + + +def time_reps( + func: Callable[[], object], + repeats: int = REPETITIONS, + parallel: int = PARALLEL, +) -> float: + """Run *func* *reps* times and return mean wall-clock seconds per call. + + When ``parallel > 1``, up to ``parallel`` calls run simultaneously in + threads. Numpy and OpenCV release the GIL for their C-level work, so + threads can execute in parallel on multi-core machines. Each thread + records its own elapsed time; the mean across all *reps* is returned. + + When ``parallel == 1`` the original sequential loop is used, avoiding + any thread-scheduling overhead and improving accuracy for cheap functions. + + A full GC cycle is run before timing so accumulated garbage from earlier + stages does not trigger collection mid-measurement and inflate results. + """ + gc.collect() + if parallel <= 1: + t0 = time.perf_counter() + for _ in range(repeats): + func() + return (time.perf_counter() - t0) / repeats + + def _timed() -> float: + t0 = time.perf_counter() + func() + return time.perf_counter() - t0 + + with ThreadPoolExecutor(max_workers=min(parallel, repeats)) as pool: + timings = list(pool.map(lambda _: _timed(), range(repeats))) + return sum(timings) / repeats + + +# ══════════════════════════════════════════════════════════════════════════════ +# Benchmark stages +# ══════════════════════════════════════════════════════════════════════════════ + + +def stage_build( + num_objects: int, + image_height: int, + image_width: int, + fill_fraction: float, + num_vertices: int = 20, +) -> tuple[np.ndarray, np.ndarray, np.ndarray, CompactMask]: + """Synthesize polygon masks and build the CompactMask.""" + xyxy, masks_dense, class_ids = make_detections( + num_objects, image_height, image_width, fill_fraction, num_vertices + ) + compact_mask = CompactMask.from_dense( + masks_dense, xyxy, image_shape=(image_height, image_width) + ) + return xyxy, masks_dense, class_ids, compact_mask + + +def stage_encode( + masks_dense: np.ndarray, + xyxy: np.ndarray, + image_height: int, + image_width: int, +) -> float: + """Per-mask encode time: encode each mask individually and average over N. + + Calling from_dense one mask at a time (rather than batching all N) isolates + the per-shape cost — each polygon has a different RLE run count, so the + average reflects true shape variance. + """ + num_masks = len(masks_dense) + image_shape = (image_height, image_width) + + def _encode_each() -> None: + for i in range(num_masks): + CompactMask.from_dense( + masks_dense[i : i + 1], xyxy[i : i + 1], image_shape=image_shape + ) + + return time_reps(_encode_each) / max(num_masks, 1) + + +def stage_decode(compact_mask: CompactMask) -> float: + """Per-mask decode time: decode each mask individually and average over N. + + Building a list via compact_mask[i] decodes each crop separately, giving + the per-mask cost of materialising a single RLE back to a dense array. + """ + num_masks = len(compact_mask) + return time_reps(lambda: [compact_mask[i] for i in range(num_masks)]) / max( + num_masks, 1 + ) + + +def stage_area( + det_dense: sv.Detections, det_compact: sv.Detections +) -> tuple[float, float]: + """Time .area on both representations.""" + return ( + time_reps(lambda: det_dense.area), + time_reps(lambda: det_compact.area), + ) + + +def stage_filter( + det_dense: sv.Detections, det_compact: sv.Detections +) -> tuple[float, float]: + """Time boolean filtering (keep every other detection).""" + keep = np.arange(len(det_dense)) % 2 == 0 + return ( + time_reps(lambda: det_dense[keep]), + time_reps(lambda: det_compact[keep]), + ) + + +def stage_annotate( + scene: np.ndarray, det_dense: sv.Detections, det_compact: sv.Detections +) -> tuple[float, float]: + """Time MaskAnnotator on both representations.""" + annotator = sv.MaskAnnotator(opacity=0.5) + return ( + time_reps(lambda: annotator.annotate(scene.copy(), det_dense)), + time_reps(lambda: annotator.annotate(scene.copy(), det_compact)), + ) + + +def stage_correctness( + scene: np.ndarray, + masks_dense: np.ndarray, + compact_mask: CompactMask, + det_dense: sv.Detections, + det_compact: sv.Detections, +) -> tuple[bool, bool, bool]: + """Return (pixel_perfect, areas_match, roundtrip_ok).""" + annotator = sv.MaskAnnotator(opacity=0.5) + out_dense = annotator.annotate(scene.copy(), det_dense) + out_compact = annotator.annotate(scene.copy(), det_compact) + pixel_perfect = bool(np.array_equal(out_dense, out_compact)) + areas_match = bool(np.allclose(det_dense.area, det_compact.area)) + roundtrip_ok = bool(np.array_equal(compact_mask.to_dense(), masks_dense)) + return pixel_perfect, areas_match, roundtrip_ok + + +def stage_iou( + masks_dense: np.ndarray, + compact_mask: CompactMask, + iou_dense_skipped: bool, +) -> tuple[float, float, bool | None]: + """Time pairwise self-IoU using dense (N,H,W) AND and compact crop filter. + + Correctness is checked on the first 10 masks only to keep it fast, + regardless of whether full dense IoU timing is skipped. + """ + correct_n = min(len(compact_mask), 10) + iou_compact_small = sv.mask_iou_batch( + compact_mask[:correct_n], compact_mask[:correct_n] + ) + iou_dense_small = sv.mask_iou_batch( + masks_dense[:correct_n], masks_dense[:correct_n] + ) + iou_ok = bool(np.allclose(iou_dense_small, iou_compact_small, atol=1e-4)) + + compact_iou_s = time_reps(lambda: sv.mask_iou_batch(compact_mask, compact_mask)) + if iou_dense_skipped: + dense_iou_s = math.nan + else: + dense_iou_s = time_reps( + lambda: sv.mask_iou_batch(masks_dense, masks_dense), + repeats=IOU_NMS_REPS, + ) + return dense_iou_s, compact_iou_s, iou_ok + + +def stage_nms( + xyxy: np.ndarray, + confidence: np.ndarray, + class_ids: np.ndarray, + masks_dense: np.ndarray, + compact_mask: CompactMask, + dense_skipped: bool, + iou_dense_skipped: bool, +) -> tuple[float, float, bool | None, int]: + """Time mask NMS. Dense resizes to 640 before IoU; compact uses exact crop IoU. + + Compact NMS is strictly more accurate than dense: it computes pixel-level IoU + directly on the full-resolution RLE crops instead of a lossy 640px-downsampled + approximation. For pairs whose true IoU is very close to the 0.5 threshold, + the resize step in the dense path can flip a keep/suppress decision. + + ``n_diff`` counts detections whose decision differs between the two paths. + ``nms_ok`` is True when ``n_diff == 0``. + + Dense NMS is skipped when ``dense_skipped`` *or* ``iou_dense_skipped`` is True: + NMS calls mask_iou_batch internally so the cost is the same as IoU. + + Returns: + Tuple of ``(dense_nms_s, compact_nms_s, nms_ok, n_diff)``. + """ + predictions = np.c_[xyxy, confidence, class_ids.astype(float)] + + compact_nms_s = time_reps( + lambda: sv.mask_non_max_suppression(predictions, compact_mask) + ) + if dense_skipped or iou_dense_skipped: + return math.nan, compact_nms_s, None, 0 + + keep_dense = sv.mask_non_max_suppression(predictions, masks_dense) + keep_compact = sv.mask_non_max_suppression(predictions, compact_mask) + n_diff = int(np.sum(keep_dense != keep_compact)) + nms_ok = n_diff == 0 + dense_nms_s = time_reps( + lambda: sv.mask_non_max_suppression(predictions, masks_dense), + repeats=IOU_NMS_REPS, + ) + return dense_nms_s, compact_nms_s, nms_ok, n_diff + + +def stage_merge( + det_dense: sv.Detections | None, + det_compact: sv.Detections, + dense_skipped: bool, +) -> tuple[float, float, bool | None]: + """Time Detections.merge on two half-splits. + + Dense: np.vstack; compact: RLE concat. + Splits are pre-computed so the timed lambda measures only the merge. + """ + half = len(det_compact) // 2 + compact_a, compact_b = det_compact[:half], det_compact[half:] + + compact_merge_s = time_reps(lambda: sv.Detections.merge([compact_a, compact_b])) + if dense_skipped or det_dense is None: + return math.nan, compact_merge_s, None + + dense_a, dense_b = det_dense[:half], det_dense[half:] + merged_d = sv.Detections.merge([dense_a, dense_b]) + merged_c = sv.Detections.merge([compact_a, compact_b]) + merge_ok = bool(np.allclose(merged_d.area, merged_c.area)) + dense_merge_s = time_reps(lambda: sv.Detections.merge([dense_a, dense_b])) + return dense_merge_s, compact_merge_s, merge_ok + + +def stage_offset( + masks_dense: np.ndarray, + compact_mask: CompactMask, + image_height: int, + image_width: int, + dense_skipped: bool, +) -> tuple[float, float, bool | None]: + """Time mask offset: move_masks (N,H,W) copy vs O(N) offset update.""" + dx, dy = 10, 10 + # Expand the canvas by the offset so no shifted crop overflows boundary. + # Both move_masks and with_offset.to_dense() operate on identical space. + new_h, new_w = image_height + dy, image_width + dx + new_shape = (new_h, new_w) + + compact_offset_s = time_reps( + lambda: compact_mask.with_offset(dx, dy, new_image_shape=new_shape) + ) + if dense_skipped: + return math.nan, compact_offset_s, None + + moved_dense = sv.move_masks( + masks_dense, np.array([dx, dy]), resolution_wh=(new_w, new_h) + ) + moved_compact = compact_mask.with_offset( + dx, dy, new_image_shape=new_shape + ).to_dense() + offset_ok = bool(np.array_equal(moved_dense, moved_compact)) + dense_offset_s = time_reps( + lambda: sv.move_masks( + masks_dense, np.array([dx, dy]), resolution_wh=(new_w, new_h) + ) + ) + return dense_offset_s, compact_offset_s, offset_ok + + +def stage_centroids( + masks_dense: np.ndarray, + compact_mask: CompactMask, + dense_skipped: bool, +) -> tuple[float, float, bool | None]: + """Time centroid: np.tensordot on full stack (dense) vs per-crop (compact).""" + compact_centroids_s = time_reps(lambda: sv.calculate_masks_centroids(compact_mask)) + if dense_skipped: + return math.nan, compact_centroids_s, None + + c_dense = sv.calculate_masks_centroids(masks_dense) + c_compact = sv.calculate_masks_centroids(compact_mask) + centroids_ok = bool(np.allclose(c_dense, c_compact, atol=1.0)) # 1-pixel tolerance + dense_centroids_s = time_reps(lambda: sv.calculate_masks_centroids(masks_dense)) + return dense_centroids_s, compact_centroids_s, centroids_ok + + +# ══════════════════════════════════════════════════════════════════════════════ +# Scenario runner — orchestrates stages +# ══════════════════════════════════════════════════════════════════════════════ + + +def run_scenario( + name: str, + num_objects: int, + image_height: int, + image_width: int, + fill_fraction: float = 0.10, + num_vertices: int = 20, +) -> ScenarioResult: + resolution = f"{image_width}x{image_height}" + fill_name = f"{fill_fraction:.0%}" + console.rule( + f"[bold]{name}[/bold] | {num_objects} objects · {resolution} " + f"· fill≈{fill_name} · polygon/{num_vertices} vertices" + ) + + xyxy, masks_dense, class_ids, compact_mask = stage_build( + num_objects, image_height, image_width, fill_fraction, num_vertices + ) + scene = make_scene(image_height, image_width) + + # ── memory ────────────────────────────────────────────────────────────── + dense_bytes = dense_memory_bytes(masks_dense) + dense_skipped = dense_bytes > DENSE_SKIP_GB * 1e9 + compact_theoretical = compact_memory_bytes_theoretical(compact_mask) + + # Only measure dense tracemalloc when it's safe to allocate the full array. + dense_actual = ( + 0 + if dense_skipped + else dense_memory_bytes_actual(num_objects, image_height, image_width) + ) + compact_actual = compact_memory_bytes_actual( + masks_dense, xyxy, (image_height, image_width) + ) + + encode_s = stage_encode(masks_dense, xyxy, image_height, image_width) + decode_s = stage_decode(compact_mask) + + theory_ratio = dense_bytes / max(compact_theoretical, 1) + if dense_skipped: + malloc_ratio_str = "[dim]—[/dim]" + dense_actual_str = "[dim]skipped[/dim]" + else: + malloc_ratio = dense_actual / max(compact_actual, 1) + malloc_ratio_str = _fmt_ratio(malloc_ratio) + dense_actual_str = f"{dense_actual / 1e6:.1f} MB" + console.print( + f"\tmemory >>\n" + f"\t\ttheory :: dense={dense_bytes / 1e6:.1f} MB " + f"| compact={compact_theoretical / 1e3:.0f} KB " + f"\t{_fmt_ratio(theory_ratio)}\n" + f"\t\tmalloc :: dense={dense_actual_str} " + f"| compact={compact_actual / 1e3:.0f} KB " + f"\t{malloc_ratio_str}" + ) + console.print(f"\t encode (from_dense)\t={encode_s * 1e3:.3f} ms/mask") + console.print(f"\t decode (to_dense)\t={decode_s * 1e3:.3f} ms/mask") + + # ── skip flags ────────────────────────────────────────────────────────── + iou_dense_skipped = dense_bytes > IOU_DENSE_SKIP_GB * 1e9 + if dense_skipped: + console.print( + f"\t[yellow]dense array is {dense_bytes / 1e9:.1f} GB " + f"(>{DENSE_SKIP_GB:.0f} GB threshold) — skipping dense timing" + f"[/yellow]" + ) + elif iou_dense_skipped: + console.print( + f"\t[yellow]dense IoU skipped (>{IOU_DENSE_SKIP_GB:.0f}GB thr.)[/yellow]" + ) + + confidence = ( + np.random.default_rng(1).uniform(0.3, 0.99, num_objects).astype(np.float32) + ) + det_compact = sv.Detections(xyxy=xyxy, mask=compact_mask, class_id=class_ids) + + if dense_skipped: + dense_area_s = dense_filter_s = dense_annot_s = math.nan + compact_area_s = _time_compact_area(det_compact) + compact_filter_s = _time_compact_filter(det_compact) + compact_annot_s = _time_compact_annotate(scene, det_compact) + pixel_perfect = areas_match = roundtrip_ok = None + det_dense = None + else: + det_dense = sv.Detections(xyxy=xyxy, mask=masks_dense, class_id=class_ids) + dense_area_s, compact_area_s = stage_area(det_dense, det_compact) + dense_filter_s, compact_filter_s = stage_filter(det_dense, det_compact) + dense_annot_s, compact_annot_s = stage_annotate(scene, det_dense, det_compact) + pixel_perfect, areas_match, roundtrip_ok = stage_correctness( + scene, masks_dense, compact_mask, det_dense, det_compact + ) + + dense_iou_s, compact_iou_s, iou_ok = stage_iou( + masks_dense, compact_mask, iou_dense_skipped + ) + dense_nms_s, compact_nms_s, nms_ok, nms_diff = stage_nms( + xyxy, + confidence, + class_ids, + masks_dense, + compact_mask, + dense_skipped, + iou_dense_skipped, + ) + dense_merge_s, compact_merge_s, merge_ok = stage_merge( + det_dense, det_compact, dense_skipped + ) + dense_offset_s, compact_offset_s, offset_ok = stage_offset( + masks_dense, compact_mask, image_height, image_width, dense_skipped + ) + dense_centroids_s, compact_centroids_s, centroids_ok = stage_centroids( + masks_dense, compact_mask, dense_skipped + ) + + def _timing_line(label: str, dense_s: float, compact_s: float) -> str: + compact_ms = f"{compact_s * 1e3:.2f} ms" + if math.isnan(dense_s): + return ( + f"\t{label}\t -> dense=[dim]—[/dim]" + f"\t\t | compact={compact_ms}\t | speedup=[dim]—[/dim]" + ) + dense_ms = f"{dense_s * 1e3:.2f} ms" + speedup = _fmt_ratio(dense_s / max(compact_s, 1e-9)) + return ( + f"\t{label}\t -> dense={dense_ms}\t | " + f"compact={compact_ms}\t | speedup={speedup}" + ) + + console.print(_timing_line(".area ", dense_area_s, compact_area_s)) + console.print(_timing_line("annotate ", dense_annot_s, compact_annot_s)) + console.print(_timing_line("centroids", dense_centroids_s, compact_centroids_s)) + console.print(_timing_line("filter ", dense_filter_s, compact_filter_s)) + console.print(_timing_line("iou ", dense_iou_s, compact_iou_s)) + console.print(_timing_line("merge ", dense_merge_s, compact_merge_s)) + console.print(_timing_line("nms ", dense_nms_s, compact_nms_s)) + console.print(_timing_line("offset ", dense_offset_s, compact_offset_s)) + + checks = { + "pixel-perfect": pixel_perfect, + "areas": areas_match, + "roundtrip": roundtrip_ok, + "iou": iou_ok, + "nms": nms_ok, + "merge": merge_ok, + "offset": offset_ok, + "centroids": centroids_ok, + } + parts = [] + for k, v in checks.items(): + if k == "nms" and v is False: + parts.append(f"nms=[red]✗({nms_diff})[/red]") + else: + parts.append( + f"{k}=" + + ( + "[dim]—[/dim]" + if v is None + else "[green]✓[/green]" + if v + else "[red]✗[/red]" + ) + ) + all_checked = [v for v in checks.values() if v is not None] + overall = ( + "[green]✓ all correct[/green]" + if all_checked and all(all_checked) + else "[red]✗ MISMATCH[/red]" + if any(v is False for v in checks.values()) + else "[dim]—[/dim]" + ) + console.print(" correctness >> " + " | ".join(parts) + f" | {overall}") + + return ScenarioResult( + name=name, + resolution=resolution, + num_objects=num_objects, + fill_name=fill_name, + num_vertices=num_vertices, + dense_bytes=dense_bytes, + compact_bytes_theoretical=compact_theoretical, + dense_bytes_actual=dense_actual, + compact_bytes_actual=compact_actual, + encode_s=encode_s, + decode_s=decode_s, + dense_area_s=dense_area_s, + compact_area_s=compact_area_s, + dense_filter_s=dense_filter_s, + compact_filter_s=compact_filter_s, + dense_annot_s=dense_annot_s, + compact_annot_s=compact_annot_s, + dense_iou_s=dense_iou_s, + compact_iou_s=compact_iou_s, + dense_nms_s=dense_nms_s, + compact_nms_s=compact_nms_s, + dense_merge_s=dense_merge_s, + compact_merge_s=compact_merge_s, + dense_offset_s=dense_offset_s, + compact_offset_s=compact_offset_s, + dense_centroids_s=dense_centroids_s, + compact_centroids_s=compact_centroids_s, + pixel_perfect=pixel_perfect, + areas_match=areas_match, + roundtrip_ok=roundtrip_ok, + iou_ok=iou_ok, + nms_ok=nms_ok, + nms_mismatch_count=nms_diff, + merge_ok=merge_ok, + offset_ok=offset_ok, + centroids_ok=centroids_ok, + dense_skipped=dense_skipped, + iou_dense_skipped=iou_dense_skipped, + ) + + +def _time_compact_area(det_compact: sv.Detections) -> float: + """Time .area on the compact detections (used when dense timing is skipped).""" + return time_reps(lambda: det_compact.area) + + +def _time_compact_filter(det_compact: sv.Detections) -> float: + """Time boolean-index filtering on the compact detections (dense-skip path).""" + keep = np.arange(len(det_compact)) % 2 == 0 + return time_reps(lambda: det_compact[keep]) + + +def _time_compact_annotate(scene: np.ndarray, det_compact: sv.Detections) -> float: + """Time MaskAnnotator on the compact detections (dense-skip path).""" + annotator = sv.MaskAnnotator(opacity=0.5) + return time_reps(lambda: annotator.annotate(scene.copy(), det_compact)) + + +# ══════════════════════════════════════════════════════════════════════════════ +# Rich summary table +# ══════════════════════════════════════════════════════════════════════════════ + +_OPS = ("area", "filter", "annot", "iou", "nms", "merge", "offset", "centroids") + + +def _build_summary_df(results: list[ScenarioResult]) -> pd.DataFrame: + """Compute derived summary columns from scenario results. + + Returns a DataFrame with all ScenarioResult fields plus derived columns + (ratios, speedups, ok) as raw floats. Consumers apply their own formatting. + """ + df = pd.DataFrame([dataclasses.asdict(r) for r in results]) + df["ratio_theory"] = df["dense_bytes"] / df["compact_bytes_theoretical"].clip( + lower=1 + ) + df["ratio_malloc"] = df["dense_bytes_actual"] / df["compact_bytes_actual"].clip( + lower=1 + ) + # dense_bytes_actual == 0 (not measured) when dense_skipped — clear those cells + df.loc[df["dense_skipped"], "ratio_malloc"] = None + for op in _OPS: + df[f"{op}_speedup"] = df[f"dense_{op}_s"] / df[f"compact_{op}_s"].clip( + lower=1e-9 + ) + + check_cols = [ + "pixel_perfect", + "areas_match", + "roundtrip_ok", + "iou_ok", + "nms_ok", + "merge_ok", + "offset_ok", + "centroids_ok", + ] + df["ok"] = df.apply( + lambda row: ( + False + if any(row[c] is False for c in check_cols) + else True + if any(row[c] is True for c in check_cols) + else None + ), + axis=1, + ) + return df + + +def _fmt_ratio(ratio: float) -> str: + """Format a speedup/compression ratio with colour coding. + + ≥10 → green (large win), 1-10 → yellow (modest win), <1 → red (regression). + Integer for ≥10, two decimals otherwise. + """ + fmt = f"{ratio:.0f}x" if ratio >= 10 else f"{ratio:.2f}x" + if ratio >= 10: + return f"[green]{fmt}[/green]" + elif ratio >= 1: + return f"[yellow]{fmt}[/yellow]" + else: + return f"[red]{fmt}[/red]" + + +def _fmt_speedup(dense_s: float, compact_s: float) -> str: + if math.isnan(dense_s): + # Dense was skipped — show compact absolute time so the column isn't empty. + return f"[dim]{compact_s * 1e3:.1f} ms[/dim]" + return _fmt_ratio(dense_s / max(compact_s, 1e-9)) + + +def print_summary(results: list[ScenarioResult]) -> None: + table = Table( + title="CompactMask — benchmark summary", + box=box.ROUNDED, + show_lines=True, + header_style="bold cyan", + min_width=console.width, + ) + table.add_column("Scenario", style="bold", min_width=22) + table.add_column("Objects", justify="right", min_width=7) + table.add_column("Resolution", min_width=12, no_wrap=True) + table.add_column("Fill", justify="right", min_width=5, no_wrap=True) + table.add_column("Vertices", justify="right", min_width=8, no_wrap=True) + table.add_column("Dense\ntheory", justify="right", min_width=10) + table.add_column("Compact\ntheory", justify="right", style="green", min_width=9) + table.add_column("Ratio\ntheory", justify="right", min_width=7) + table.add_column("Dense\nmalloc", justify="right", style="cyan", min_width=9) + table.add_column("Compact\nmalloc", justify="right", style="cyan", min_width=9) + table.add_column("Ratio\nmalloc", justify="right", min_width=7) + table.add_column("Encode\n(ms/mask)", justify="right", style="yellow", min_width=7) + table.add_column("Decode\n(ms/mask)", justify="right", style="yellow", min_width=7) + table.add_column("Area\natt.", justify="right", min_width=6) + table.add_column("Filter\nop.", justify="right", min_width=6) + table.add_column("Annot\nop.", justify="right", min_width=6) + table.add_column("IoU\nop.", justify="right", min_width=6) + table.add_column("NMS\nop.", justify="right", min_width=6) + table.add_column("Merge\nop.", justify="right", min_width=6) + table.add_column("Offset\nop.", justify="right", min_width=6) + table.add_column("Centr\nop.", justify="right", min_width=6) + table.add_column("OK?", justify="center", min_width=4) + + for _, row in _build_summary_df(results).iterrows(): + ok = row["ok"] + ok_cell = ( + "[red]✗[/red]" + if ok is False + else "[green]✓[/green]" + if ok is True + else "[dim]—[/dim]" + ) + dense_malloc_cell = ( + "[dim]—[/dim]" + if row["dense_skipped"] + else f"{row['dense_bytes_actual'] / 1e6:.1f} MB" + ) + malloc_ratio_cell = ( + "[dim]—[/dim]" if row["dense_skipped"] else _fmt_ratio(row["ratio_malloc"]) + ) + table.add_row( + row["name"], + str(row["num_objects"]), + row["resolution"], + row["fill_name"], + str(row["num_vertices"]), + f"{row['dense_bytes'] / 1e6:.1f} MB", + f"{row['compact_bytes_theoretical'] / 1e3:.0f} KB", + _fmt_ratio(row["ratio_theory"]), + dense_malloc_cell, + f"{row['compact_bytes_actual'] / 1e3:.0f} KB", + malloc_ratio_cell, + f"{row['encode_s'] * 1e3:.1f}", + f"{row['decode_s'] * 1e3:.1f}", + _fmt_speedup(row["dense_area_s"], row["compact_area_s"]), + _fmt_speedup(row["dense_filter_s"], row["compact_filter_s"]), + _fmt_speedup(row["dense_annot_s"], row["compact_annot_s"]), + _fmt_speedup(row["dense_iou_s"], row["compact_iou_s"]), + _fmt_speedup(row["dense_nms_s"], row["compact_nms_s"]), + _fmt_speedup(row["dense_merge_s"], row["compact_merge_s"]), + _fmt_speedup(row["dense_offset_s"], row["compact_offset_s"]), + _fmt_speedup(row["dense_centroids_s"], row["compact_centroids_s"]), + ok_cell, + ) + + console.print(table) + console.print( + "[dim]" + + " · ".join( + [ + "Vertices — polygon vertex count " + "(complexity proxy: more = jaggier boundary)", + "Dense theory — NxHxW bytes (raw numpy buffer)", + "Compact theory — sum of internal numpy buffer sizes", + "Ratio (theory) — dense / compact theoretical ratio", + "Dense malloc — tracemalloc peak during np.zeros allocation", + "Compact malloc — tracemalloc peak during .from_dense()", + "Ratio (malloc) — dense / compact tracemalloc peak ratio", + "Encode ms/mask — from_dense() / N (dense→compact overhead per mask)", + "Decode ms/mask — to_dense() / N (compact→dense overhead per mask)", + "Area x — .area speedup (RLE sum, no materialisation)", + "Filter x — boolean-index speedup", + "Annot x — MaskAnnotator speedup (crop-paint vs full-frame alloc)", + f"IoU x — pairwise self-IoU speedup " + f"(dense skipped >{IOU_DENSE_SKIP_GB:.0f} GB)", + "NMS x — mask_non_max_suppression speedup", + "Merge x — Detections.merge speedup", + "Offset x — move_masks vs with_offset speedup", + "Centroids x — calculate_masks_centroids speedup", + "dim ms — dense skipped, compact absolute time shown", + ] + ) + + "[/dim]" + ) + + +# ══════════════════════════════════════════════════════════════════════════════ +# Results persistence +# ══════════════════════════════════════════════════════════════════════════════ + + +def _append_result(result: ScenarioResult, path: Path) -> None: + """Append one scenario result as a JSON line to *path*. + + ``math.nan`` (used for skipped dense timings) is serialised as ``null`` + so the file is valid JSON-Lines and can be read back with any JSON parser. + """ + row = { + k: (None if isinstance(v, float) and math.isnan(v) else v) + for k, v in dataclasses.asdict(result).items() + } + with path.open("a", encoding="utf-8") as fh: + fh.write(json.dumps(row) + "\n") + + +def save_results_csv(results: list[ScenarioResult], path: Path) -> None: + """Write the summary table to *path* as a CSV file. + + Each row mirrors the Rich summary table: scenario metadata, memory ratios, + encode/decode overhead, and per-operation speedups. Columns whose dense + timing was skipped are written as empty cells. + """ + df = _build_summary_df(results) + pd.DataFrame( + { + "scenario": df["name"], + "objects": df["num_objects"], + "resolution": df["resolution"], + "fill": df["fill_name"], + "vertices": df["num_vertices"], + "dense_theory_mb": (df["dense_bytes"] / 1e6).round(1), + "compact_theory_kb": (df["compact_bytes_theoretical"] / 1e3).round(1), + "ratio_theory": df["ratio_theory"].round(0), + "dense_malloc_mb": (df["dense_bytes_actual"] / 1e6) + .where(~df["dense_skipped"]) + .round(1), + "compact_malloc_kb": (df["compact_bytes_actual"] / 1e3).round(1), + "ratio_malloc": df["ratio_malloc"].round(0), + "encode_ms_per_mask": (df["encode_s"] * 1e3).round(4), + "decode_ms_per_mask": (df["decode_s"] * 1e3).round(4), + **{f"{op}_speedup": df[f"{op}_speedup"].round(2) for op in _OPS}, + "ok": df["ok"], + } + ).to_csv(path, index=False) + + +# ══════════════════════════════════════════════════════════════════════════════ +# Entry point +# ══════════════════════════════════════════════════════════════════════════════ + + +def main() -> None: + # ── parameter matrix ────────────────────────────────────────────────────── + # (tier_label, (image_width, image_height), num_objects) + TIERS: list[tuple[str, tuple[int, int], int]] = [ + ("FHD", (1920, 1080), 100), # full comparison (0.21 GB < 1 GB IoU thr.) + ("FHD", (1920, 1080), 200), # full comparison (0.41 GB < 1 GB IoU thr.) + ("FHD", (1920, 1080), 400), # full comparison (0.83 GB < 1 GB IoU thr.) + ("4K", (3840, 2160), 100), # full comparison (0.83 GB < 1 GB IoU thr.) + ("4K", (3840, 2160), 200), # dense excl. IoU/NMS (1.66 GB > 1 GB thr.) + ("SAT", (8192, 8192), 200), # dense excl. IoU/NMS (13.4 GB > 1 GB thr.) + ] + FILL_FRACTIONS = [0.05, 0.20, 0.50] # sparse / moderate / SAM-everything + VERTEX_COUNTS = [8, 128, 600] # low / realistic / YOLOv8-seg default + + scenarios = [ + { + "name": f"{tier}-{num_objects}-{fill_fraction:.0%}-v{num_vertices}", + "num_objects": num_objects, + "image_height": img_h, + "image_width": img_w, + "fill_fraction": fill_fraction, + "num_vertices": num_vertices, + } + for tier, (img_w, img_h), num_objects in TIERS + for fill_fraction in FILL_FRACTIONS + for num_vertices in VERTEX_COUNTS + ] + + timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S") + results_path = Path(__file__).parent / f"results_{timestamp}.jsonl" + + console.print( + f"[bold]supervision[/bold]" + f" {sv.__version__} · numpy {np.__version__} · {len(scenarios)} scenarios" + f" · saving to [dim]{results_path.name}[/dim]" + ) + + results = [] + progress = Progress( + TextColumn("[progress.description]{task.description}"), + BarColumn(), + MofNCompleteColumn(), + TaskProgressColumn(), + TimeElapsedColumn(), + console=console, + ) + with progress: + task = progress.add_task("benchmarking…", total=len(scenarios)) + for params in scenarios: + progress.update(task, description=f"[bold]{params['name']}[/bold]") + result = run_scenario(**params) + results.append(result) + _append_result(result, results_path) + gc.collect() # flush scenario temporaries before next run + progress.advance(task) + + print_summary(results) + + csv_path = results_path.with_suffix(".csv") + save_results_csv(results, csv_path) + console.print(f"[dim]results saved → {results_path.name} · {csv_path.name}[/dim]") + + +if __name__ == "__main__": + main() diff --git a/examples/time_in_zone/README.md b/examples/time_in_zone/README.md index cb24e6969f..54cc44bd69 100644 --- a/examples/time_in_zone/README.md +++ b/examples/time_in_zone/README.md @@ -222,7 +222,7 @@ Script to run object detection on an RTSP stream using the RF-DETR model. - `--model_size`: RF-DETR backbone size to load — choose from 'nano', 'small', 'medium', 'base', or 'large' (default 'medium'). - `--device`: Compute device to run the model on ('cpu', 'mps', or 'cuda'; default 'cpu'). - `--classes`: Space-separated list of class IDs to track. Leave empty to track all classes. -- `--confidence_threshold`: Minimum confidence score for a detection to be kept, range 0–1 (default 0.3). +- `--confidence_threshold`: Minimum confidence score for a detection to be kept, range 0-1 (default 0.3). - `--iou_threshold`: IOU threshold applied during non-max suppression (default 0.7). - `--resolution`: Shortest-side input resolution supplied to the model. The script will round it to the nearest valid multiple (default 640). diff --git a/src/supervision/__init__.py b/src/supervision/__init__.py index 1bda28164d..8b56f597fd 100644 --- a/src/supervision/__init__.py +++ b/src/supervision/__init__.py @@ -45,6 +45,7 @@ ) from supervision.dataset.formats.coco import get_coco_class_index_mapping from supervision.dataset.utils import mask_to_rle, rle_to_mask +from supervision.detection.compact_mask import CompactMask from supervision.detection.core import Detections from supervision.detection.line_zone import ( LineZone, @@ -161,6 +162,7 @@ "ColorAnnotator", "ColorLookup", "ColorPalette", + "CompactMask", "ComparisonAnnotator", "ConfusionMatrix", "CropAnnotator", diff --git a/src/supervision/annotators/core.py b/src/supervision/annotators/core.py index a2f729b31b..a579551415 100644 --- a/src/supervision/annotators/core.py +++ b/src/supervision/annotators/core.py @@ -434,6 +434,11 @@ def annotate( colored_mask = np.array(scene, copy=True, dtype=np.uint8) + from supervision.detection.compact_mask import CompactMask + + compact_mask = ( + detections.mask if isinstance(detections.mask, CompactMask) else None + ) for detection_idx in np.flip(np.argsort(detections.area)): color = resolve_color( color=self.color, @@ -443,8 +448,21 @@ def annotate( if custom_color_lookup is None else custom_color_lookup, ) - mask = np.asarray(detections.mask[detection_idx], dtype=bool) - colored_mask[mask] = color.as_bgr() + if compact_mask is not None: + # Paint only the bounding-box crop — avoids a full (H, W) alloc. + x1 = int(compact_mask.offsets[detection_idx, 0]) + y1 = int(compact_mask.offsets[detection_idx, 1]) + crop_m = compact_mask.crop(detection_idx) + crop_h, crop_w = crop_m.shape + colored_mask[y1 : y1 + crop_h, x1 : x1 + crop_w][crop_m] = ( + color.as_bgr() + ) + else: + mask = np.asarray( + detections.mask[detection_idx], + dtype=bool, + ) + colored_mask[mask] = color.as_bgr() cv2.addWeighted( colored_mask, self.opacity, scene, 1 - self.opacity, 0, dst=scene @@ -2900,8 +2918,8 @@ def annotate(self, scene: ImageType, detections: Detections) -> ImageType: colored_mask[y1:y2, x1:x2] = scene[y1:y2, x1:x2] else: for mask in detections.mask: - mask = np.asarray(mask, dtype=bool) - colored_mask[mask] = scene[mask] + mask_bool = np.asarray(mask, dtype=bool) + colored_mask[mask_bool] = scene[mask_bool] np.copyto(scene, colored_mask) return scene diff --git a/src/supervision/detection/compact_mask.py b/src/supervision/detection/compact_mask.py new file mode 100644 index 0000000000..32135212d7 --- /dev/null +++ b/src/supervision/detection/compact_mask.py @@ -0,0 +1,905 @@ +"""Crop-RLE compact mask storage for memory-efficient instance segmentation. + +Dense ``(N, H, W)`` boolean masks use O(N·H·W) memory, which becomes +prohibitive for aerial imagery (e.g. 1000 objects x 4K image ~ 8.3 GB). +:class:`CompactMask` stores each mask as a run-length encoding of its +bounding-box crop, reducing typical usage to tens of MB. + +The bounding boxes (``xyxy``) already present in ``Detections`` serve as the +crop boundaries, so no extra metadata is required from the caller. +""" + +from __future__ import annotations + +from collections.abc import Iterator +from typing import Any, cast + +import numpy as np +import numpy.typing as npt + + +def _rle_encode(mask_2d: npt.NDArray[Any]) -> npt.NDArray[np.int32]: + """Run-length encode a 2D boolean mask in row-major order. + + The encoding starts with the count of leading ``False`` values (may be 0 + if the mask begins with ``True``). Subsequent values alternate between + ``True`` and ``False`` run counts. + + Args: + mask_2d: 2D boolean array of shape ``(H, W)``. + + Returns: + int32 array of run lengths, starting with the False count. + + Examples: + ```pycon + >>> import numpy as np + >>> from supervision.detection.compact_mask import _rle_encode + >>> mask = np.array([[False, True, True], [True, False, False]]) + >>> _rle_encode(mask).tolist() + [1, 3, 2] + + ``` + """ + flat = mask_2d.ravel() # C-order (row-major) + if len(flat) == 0: + return np.array([0], dtype=np.int32) + + # Locate positions where the boolean value changes. + changes = np.diff(flat.view(np.uint8)) + boundaries = np.where(changes != 0)[0] + 1 + + positions = np.concatenate(([0], boundaries, [len(flat)])) + run_lengths = np.diff(positions).astype(np.int32) + + # Guarantee the encoding always starts with a False count. + if flat[0]: + run_lengths = np.concatenate(([np.int32(0)], run_lengths)) + + return run_lengths + + +def _rle_decode( + rle: npt.NDArray[np.int32], height: int, width: int +) -> npt.NDArray[np.bool_]: + """Decode a run-length encoded mask back to a 2D boolean array. + + Args: + rle: int32 array of run lengths as produced by :func:`_rle_encode`. + height: Height of the output array. + width: Width of the output array. + + Returns: + 2D boolean array of shape ``(height, width)``. + + Examples: + ```pycon + >>> import numpy as np + >>> from supervision.detection.compact_mask import _rle_decode + >>> rle = np.array([1, 3, 2], dtype=np.int32) + >>> _rle_decode(rle, 2, 3) + array([[False, True, True], + [ True, False, False]]) + + ``` + """ + # Even-indexed entries → False runs; odd-indexed entries → True runs. + is_true = np.arange(len(rle)) % 2 == 1 + flat: npt.NDArray[np.bool_] = np.repeat(is_true, rle) + num_pixels = height * width + if len(flat) < num_pixels: + # Pad with False if the RLE is shorter than expected (e.g. all-False + # tails are often omitted during encoding). + flat = np.pad(flat, (0, num_pixels - len(flat))) + return cast(npt.NDArray[np.bool_], flat[:num_pixels].reshape(height, width)) + + +def _rle_area(rle: npt.NDArray[np.int32]) -> int: + """Return the number of ``True`` pixels in a run-length encoded mask. + + Args: + rle: int32 array of run lengths as produced by :func:`_rle_encode`. + + Returns: + Total number of ``True`` pixels. + + Examples: + ```pycon + >>> import numpy as np + >>> from supervision.detection.compact_mask import _rle_area + >>> rle = np.array([1, 3, 2], dtype=np.int32) # 1 F, 3 T, 2 F + >>> _rle_area(rle) + 3 + + ``` + """ + return int(np.sum(rle[1::2])) + + +class CompactMask: + """Memory-efficient crop-RLE mask storage for instance segmentation. + + Instead of storing N full ``(H, W)`` boolean arrays, :class:`CompactMask` + encodes each mask as a run-length sequence of its bounding-box crop. This + reduces memory from O(N·H·W) to roughly O(N·bbox_area), which is orders of + magnitude smaller for sparse masks on high-resolution images. + + The class exposes a duck-typed interface compatible with ``np.ndarray`` + masks used elsewhere in ``supervision``: + + * ``mask[int]`` → dense ``(H, W)`` bool array (annotators, converters). + * ``mask[slice | list | ndarray]`` → new :class:`CompactMask` (filtering). + * ``np.asarray(mask)`` → dense ``(N, H, W)`` bool array (numpy interop). + * ``mask.shape``, ``mask.dtype``, ``mask.area`` — match the dense API. + + :class:`CompactMask` is **not** a drop-in ``np.ndarray`` replacement. + When you need to call arbitrary ndarray methods (``astype``, ``reshape``, + ``ravel``, ``any``, ``all``, …) call :meth:`to_dense` first: + ``cm.to_dense().astype(np.uint8)``. :meth:`to_dense` is the single + explicit materialisation boundary. + + .. note:: **RLE encoding incompatibility with pycocotools / COCO API** + + :class:`CompactMask` uses **row-major (C-order)** run-lengths scoped + to each mask's bounding-box crop. The COCO API (pycocotools) uses + **column-major (Fortran-order)** run-lengths scoped to the **full + image**. The two formats are not interchangeable: you cannot pass a + :class:`CompactMask` RLE directly to ``maskUtils.iou()`` or + ``maskUtils.decode()``, and you cannot load a COCO RLE dict into a + :class:`CompactMask` without re-encoding. Use + :meth:`to_dense` to obtain a standard boolean array, then pass it to + pycocotools if needed. + + Args: + rles: List of N int32 run-length arrays. + crop_shapes: Array of shape ``(N, 2)`` — ``(crop_h, crop_w)`` per mask. + offsets: Array of shape ``(N, 2)`` — ``(x1, y1)`` bounding-box origins. + image_shape: ``(H, W)`` of the full image. + + Examples: + ```pycon + >>> import numpy as np + >>> from supervision.detection.compact_mask import CompactMask + >>> masks = np.zeros((2, 100, 100), dtype=bool) + >>> masks[0, 10:20, 10:20] = True + >>> masks[1, 50:70, 50:80] = True + >>> xyxy = np.array([[10, 10, 19, 19], [50, 50, 79, 69]], dtype=np.float32) + >>> cm = CompactMask.from_dense(masks, xyxy, image_shape=(100, 100)) + >>> len(cm) + 2 + >>> cm.shape + (2, 100, 100) + + ``` + """ + + __slots__ = ("_crop_shapes", "_image_shape", "_offsets", "_rles") + + def __init__( + self, + rles: list[npt.NDArray[np.int32]], + crop_shapes: npt.NDArray[np.int32], + offsets: npt.NDArray[np.int32], + image_shape: tuple[int, int], + ) -> None: + self._rles: list[npt.NDArray[np.int32]] = rles + self._crop_shapes: npt.NDArray[np.int32] = crop_shapes # (N,2): (h,w) + self._offsets: npt.NDArray[np.int32] = offsets # (N,2): (x1,y1) + self._image_shape: tuple[int, int] = image_shape # (H, W) + + # ------------------------------------------------------------------ + # Construction + # ------------------------------------------------------------------ + + @classmethod + def from_dense( + cls, + masks: npt.NDArray[np.bool_], + xyxy: npt.NDArray[Any], + image_shape: tuple[int, int], + ) -> CompactMask: + """Create a :class:`CompactMask` from a dense ``(N, H, W)`` bool array. + + Bounding boxes are clipped to image bounds and interpreted in the + supervision ``xyxy`` convention (inclusive max coordinates). A + box with invalid ordering (``x2 < x1`` or ``y2 < y1``) is replaced by + a ``1x1`` all-False crop to avoid degenerate RLE. + + Args: + masks: Dense boolean mask array of shape ``(N, H, W)``. + xyxy: Bounding boxes of shape ``(N, 4)`` in ``[x1, y1, x2, y2]`` + format. + image_shape: ``(H, W)`` of the full image. + + Returns: + A new :class:`CompactMask` instance. + + Examples: + ```pycon + >>> import numpy as np + >>> from supervision.detection.compact_mask import CompactMask + >>> masks = np.zeros((1, 100, 100), dtype=bool) + >>> masks[0, 10:20, 10:20] = True + >>> xyxy = np.array([[10, 10, 19, 19]], dtype=np.float32) + >>> cm = CompactMask.from_dense(masks, xyxy, image_shape=(100, 100)) + >>> cm.shape + (1, 100, 100) + + ``` + """ + img_h, img_w = image_shape + num_masks = len(masks) + + if num_masks == 0: + return cls( + [], + np.empty((0, 2), dtype=np.int32), + np.empty((0, 2), dtype=np.int32), + image_shape, + ) + + rles: list[npt.NDArray[np.int32]] = [] + crop_shapes_list: list[tuple[int, int]] = [] + offsets_list: list[tuple[int, int]] = [] + + for mask_idx in range(num_masks): + x1, y1, x2, y2 = xyxy[mask_idx] + x1c = int(max(0, min(int(x1), img_w - 1))) + y1c = int(max(0, min(int(y1), img_h - 1))) + x2c = int(max(0, min(int(x2), img_w - 1))) + y2c = int(max(0, min(int(y2), img_h - 1))) + crop: npt.NDArray[np.bool_] + + # supervision xyxy uses inclusive max coords, so slicing must add +1. + if x2c < x1c or y2c < y1c: + crop = np.zeros((1, 1), dtype=bool) + x2c, y2c = x1c, y1c + else: + crop = masks[mask_idx, y1c : y2c + 1, x1c : x2c + 1] + + crop_h = y2c - y1c + 1 + crop_w = x2c - x1c + 1 + rles.append(_rle_encode(crop)) + crop_shapes_list.append((crop_h, crop_w)) + offsets_list.append((x1c, y1c)) + + crop_shapes = np.array(crop_shapes_list, dtype=np.int32) + offsets = np.array(offsets_list, dtype=np.int32) + return cls(rles, crop_shapes, offsets, image_shape) + + # ------------------------------------------------------------------ + # Materialisation + # ------------------------------------------------------------------ + + def to_dense(self) -> npt.NDArray[np.bool_]: + """Materialise all masks as a dense ``(N, H, W)`` boolean array. + + Returns: + Boolean array of shape ``(N, H, W)``. + + Examples: + ```pycon + >>> import numpy as np + >>> from supervision.detection.compact_mask import CompactMask + >>> masks = np.zeros((1, 50, 50), dtype=bool) + >>> masks[0, 10:20, 10:30] = True + >>> xyxy = np.array([[10, 10, 29, 19]], dtype=np.float32) + >>> cm = CompactMask.from_dense(masks, xyxy, image_shape=(50, 50)) + >>> cm.to_dense().shape + (1, 50, 50) + + ``` + """ + num_masks = len(self._rles) + img_h, img_w = self._image_shape + result: npt.NDArray[np.bool_] = np.zeros((num_masks, img_h, img_w), dtype=bool) + for mask_idx in range(num_masks): + crop_h, crop_w = ( + int(self._crop_shapes[mask_idx, 0]), + int(self._crop_shapes[mask_idx, 1]), + ) + x1, y1 = int(self._offsets[mask_idx, 0]), int(self._offsets[mask_idx, 1]) + crop = _rle_decode(self._rles[mask_idx], crop_h, crop_w) + result[mask_idx, y1 : y1 + crop_h, x1 : x1 + crop_w] = crop + return result + + def crop(self, index: int) -> npt.NDArray[np.bool_]: + """Decode a single mask crop without allocating the full image array. + + This is an O(crop_area) operation — ideal for annotators that only + need the cropped region. + + Args: + index: Index of the mask to decode. + + Returns: + Boolean array of shape ``(crop_h, crop_w)``. + + Examples: + ```pycon + >>> import numpy as np + >>> from supervision.detection.compact_mask import CompactMask + >>> masks = np.zeros((1, 100, 100), dtype=bool) + >>> masks[0, 20:30, 10:40] = True + >>> xyxy = np.array([[10, 20, 39, 29]], dtype=np.float32) + >>> cm = CompactMask.from_dense(masks, xyxy, image_shape=(100, 100)) + >>> cm.crop(0).shape + (10, 30) + + ``` + """ + crop_h = int(self._crop_shapes[index, 0]) + crop_w = int(self._crop_shapes[index, 1]) + return _rle_decode(self._rles[index], crop_h, crop_w) + + # ------------------------------------------------------------------ + # Sequence / array protocol + # ------------------------------------------------------------------ + + def __len__(self) -> int: + """Return the number of masks. + + Returns: + Number of masks N. + + Examples: + ```pycon + >>> from supervision.detection.compact_mask import CompactMask + >>> import numpy as np + >>> cm = CompactMask( + ... [], np.empty((0, 2), dtype=np.int32), + ... np.empty((0, 2), dtype=np.int32), (100, 100)) + >>> len(cm) + 0 + + ``` + """ + return len(self._rles) + + def __iter__(self) -> Iterator[npt.NDArray[np.bool_]]: + """Iterate over masks as dense ``(H, W)`` boolean arrays.""" + for mask_idx in range(len(self)): + yield self[mask_idx] + + @property + def shape(self) -> tuple[int, int, int]: + """Return ``(N, H, W)`` matching the dense mask convention. + + Returns: + Tuple ``(N, H, W)``. + + Examples: + ```pycon + >>> from supervision.detection.compact_mask import CompactMask + >>> import numpy as np + >>> cm = CompactMask( + ... [], np.empty((0, 2), dtype=np.int32), + ... np.empty((0, 2), dtype=np.int32), (480, 640)) + >>> cm.shape + (0, 480, 640) + + ``` + """ + img_h, img_w = self._image_shape + return (len(self), img_h, img_w) + + @property + def offsets(self) -> npt.NDArray[np.int32]: + """Return per-mask crop origins as ``(x1, y1)`` integer offsets. + + Returns: + Array of shape ``(N, 2)`` with ``int32`` offsets. + + Examples: + ```pycon + >>> import numpy as np + >>> from supervision.detection.compact_mask import CompactMask + >>> masks = np.zeros((1, 10, 10), dtype=bool) + >>> masks[0, 2:4, 3:5] = True + >>> xyxy = np.array([[3, 2, 4, 3]], dtype=np.float32) + >>> cm = CompactMask.from_dense(masks, xyxy, image_shape=(10, 10)) + >>> cm.offsets.tolist() + [[3, 2]] + + ``` + """ + return self._offsets + + @property + def bbox_xyxy(self) -> npt.NDArray[np.int32]: + """Return per-mask inclusive bounding boxes in ``xyxy`` format. + + Boxes are derived from crop metadata: + ``x2 = x1 + crop_w - 1``, ``y2 = y1 + crop_h - 1``. + + Returns: + Array of shape ``(N, 4)`` with ``int32`` boxes + ``[x1, y1, x2, y2]``. + + Examples: + ```pycon + >>> import numpy as np + >>> from supervision.detection.compact_mask import CompactMask + >>> masks = np.zeros((1, 10, 10), dtype=bool) + >>> masks[0, 2:5, 3:7] = True + >>> xyxy = np.array([[3, 2, 6, 4]], dtype=np.float32) + >>> cm = CompactMask.from_dense(masks, xyxy, image_shape=(10, 10)) + >>> cm.bbox_xyxy.tolist() + [[3, 2, 6, 4]] + + ``` + """ + if len(self) == 0: + return np.empty((0, 4), dtype=np.int32) + + x1: npt.NDArray[np.int32] = self._offsets[:, 0] + y1: npt.NDArray[np.int32] = self._offsets[:, 1] + x2: npt.NDArray[np.int32] = x1 + self._crop_shapes[:, 1] - 1 + y2: npt.NDArray[np.int32] = y1 + self._crop_shapes[:, 0] - 1 + return np.column_stack((x1, y1, x2, y2)).astype(np.int32, copy=False) + + @property + def dtype(self) -> np.dtype[Any]: + """Return ``np.dtype(bool)`` — always. + + Returns: + ``np.dtype(bool)``. + + Examples: + ```pycon + >>> from supervision.detection.compact_mask import CompactMask + >>> import numpy as np + >>> cm = CompactMask( + ... [], np.empty((0, 2), dtype=np.int32), + ... np.empty((0, 2), dtype=np.int32), (100, 100)) + >>> cm.dtype + dtype('bool') + + ``` + """ + return np.dtype(bool) + + @property + def area(self) -> npt.NDArray[np.int64]: + """Compute the area (``True`` pixel count) of each mask. + + Returns: + int64 array of shape ``(N,)`` with per-mask pixel counts. + + Examples: + ```pycon + >>> import numpy as np + >>> from supervision.detection.compact_mask import CompactMask + >>> masks = np.zeros((2, 100, 100), dtype=bool) + >>> masks[0, 0:10, 0:10] = True # 100 pixels + >>> masks[1, 0:5, 0:5] = True # 25 pixels + >>> xyxy = np.array([[0, 0, 9, 9], [0, 0, 4, 4]], dtype=np.float32) + >>> cm = CompactMask.from_dense(masks, xyxy, image_shape=(100, 100)) + >>> cm.area.tolist() + [100, 25] + + ``` + """ + return np.array([_rle_area(rle) for rle in self._rles], dtype=np.int64) + + def sum(self, axis: int | tuple[int, ...] | None = None) -> npt.NDArray[Any] | int: + """NumPy-compatible sum with a fast path for per-mask area. + + When ``axis=(1, 2)``, returns the per-mask True-pixel count via + :attr:`area` without materialising the full dense array. + + Args: + axis: Axis or axes to sum over. + + Returns: + Sum result matching NumPy semantics. + + Examples: + ```pycon + >>> import numpy as np + >>> from supervision.detection.compact_mask import CompactMask + >>> masks = np.zeros((1, 10, 10), dtype=bool) + >>> masks[0, 0:3, 0:3] = True + >>> xyxy = np.array([[0, 0, 2, 2]], dtype=np.float32) + >>> cm = CompactMask.from_dense(masks, xyxy, image_shape=(10, 10)) + >>> cm.sum(axis=(1, 2)).tolist() + [9] + + ``` + """ + if axis == (1, 2): + return self.area + return self.to_dense().sum(axis=axis) + + def __getitem__( + self, + index: int | slice | list[Any] | npt.NDArray[Any], + ) -> npt.NDArray[np.bool_] | CompactMask: + """Index into the mask collection. + + * ``int`` → dense ``(H, W)`` bool array (for annotators, iterators). + * ``slice | list | ndarray`` → new :class:`CompactMask` (for filtering). + + Args: + index: An integer returns a dense ``(H, W)`` mask. Any other + supported index type returns a new :class:`CompactMask`. + + Returns: + Dense ``(H, W)`` ``np.ndarray`` for integer index, or a new + :class:`CompactMask` for all other index types. + + Examples: + ```pycon + >>> import numpy as np + >>> from supervision.detection.compact_mask import CompactMask + >>> masks = np.zeros((3, 20, 20), dtype=bool) + >>> xyxy = np.array( + ... [[0,0,5,5],[5,5,10,10],[10,10,15,15]], dtype=np.float32) + >>> cm = CompactMask.from_dense(masks, xyxy, image_shape=(20, 20)) + >>> cm[0].shape # int → dense (H, W) + (20, 20) + >>> len(cm[[0, 2]]) # list → CompactMask + 2 + + ``` + """ + if isinstance(index, (int, np.integer)): + idx = int(index) + img_h, img_w = self._image_shape + result: npt.NDArray[np.bool_] = np.zeros((img_h, img_w), dtype=bool) + crop_h = int(self._crop_shapes[idx, 0]) + crop_w = int(self._crop_shapes[idx, 1]) + x1 = int(self._offsets[idx, 0]) + y1 = int(self._offsets[idx, 1]) + crop = _rle_decode(self._rles[idx], crop_h, crop_w) + result[y1 : y1 + crop_h, x1 : x1 + crop_w] = crop + return result + + # Slice: use direct Python list slice and numpy view — O(k), no arange. + if isinstance(index, slice): + return CompactMask( + self._rles[index], + self._crop_shapes[index], + self._offsets[index], + self._image_shape, + ) + + # Boolean selectors and fancy index → convert to integer positions first. + if isinstance(index, np.ndarray) and index.dtype == bool: + idx_arr = np.where(index)[0] + elif isinstance(index, list) and all( + isinstance(item, (bool, np.bool_)) for item in index + ): + idx_arr = np.flatnonzero(np.asarray(index, dtype=bool)) + else: + idx_arr = np.asarray(list(index), dtype=np.intp) + + new_rles = [self._rles[int(mask_idx)] for mask_idx in idx_arr] + new_crop_shapes: npt.NDArray[np.int32] = self._crop_shapes[idx_arr] + new_offsets: npt.NDArray[np.int32] = self._offsets[idx_arr] + return CompactMask(new_rles, new_crop_shapes, new_offsets, self._image_shape) + + def __array__(self, dtype: np.dtype[Any] | None = None) -> npt.NDArray[Any]: + """NumPy interop: materialise as a dense ``(N, H, W)`` array. + + Called by ``np.asarray(compact_mask)`` and similar NumPy functions. + + Args: + dtype: Optional dtype to cast the result to. + + Returns: + Dense boolean array of shape ``(N, H, W)``. + + Examples: + ```pycon + >>> import numpy as np + >>> from supervision.detection.compact_mask import CompactMask + >>> masks = np.zeros((1, 10, 10), dtype=bool) + >>> xyxy = np.array([[0, 0, 5, 5]], dtype=np.float32) + >>> cm = CompactMask.from_dense(masks, xyxy, image_shape=(10, 10)) + >>> np.asarray(cm).shape + (1, 10, 10) + + ``` + """ + result = self.to_dense() + if dtype is not None: + return result.astype(dtype) + return result + + def __eq__(self, other: object) -> bool: + """Element-wise equality with another :class:`CompactMask` or ndarray. + + Args: + other: Another :class:`CompactMask` or ``np.ndarray``. + + Returns: + ``True`` if all masks are pixel-identical. + + Examples: + ```pycon + >>> import numpy as np + >>> from supervision.detection.compact_mask import CompactMask + >>> masks = np.zeros((1, 10, 10), dtype=bool) + >>> xyxy = np.array([[0, 0, 5, 5]], dtype=np.float32) + >>> cm1 = CompactMask.from_dense(masks, xyxy, image_shape=(10, 10)) + >>> cm2 = CompactMask.from_dense(masks, xyxy, image_shape=(10, 10)) + >>> cm1 == cm2 + True + + ``` + """ + if isinstance(other, CompactMask): + return bool(np.array_equal(self.to_dense(), other.to_dense())) + if isinstance(other, np.ndarray): + return bool(np.array_equal(self.to_dense(), other)) + return NotImplemented + + # ------------------------------------------------------------------ + # Collection utilities + # ------------------------------------------------------------------ + + @staticmethod + def merge(masks_list: list[CompactMask]) -> CompactMask: + """Concatenate multiple :class:`CompactMask` objects into one. + + All inputs must have the same ``image_shape``. + + Args: + masks_list: Non-empty list of :class:`CompactMask` objects. + + Returns: + A new :class:`CompactMask` containing every mask from the inputs, + in order. + + Raises: + ValueError: If ``masks_list`` is empty or image shapes differ. + + Examples: + ```pycon + >>> import numpy as np + >>> from supervision.detection.compact_mask import CompactMask + >>> masks1 = np.zeros((2, 50, 50), dtype=bool) + >>> masks2 = np.zeros((3, 50, 50), dtype=bool) + >>> xyxy1 = np.array([[0,0,10,10],[10,10,20,20]], dtype=np.float32) + >>> xyxy2 = np.array( + ... [[0,0,5,5],[5,5,10,10],[10,10,15,15]], dtype=np.float32) + >>> cm1 = CompactMask.from_dense(masks1, xyxy1, image_shape=(50, 50)) + >>> cm2 = CompactMask.from_dense(masks2, xyxy2, image_shape=(50, 50)) + >>> len(CompactMask.merge([cm1, cm2])) + 5 + + ``` + """ + if not masks_list: + raise ValueError("Cannot merge an empty list of CompactMask objects.") + + image_shape = masks_list[0]._image_shape + for cm in masks_list[1:]: + if cm._image_shape != image_shape: + raise ValueError( + f"Cannot merge CompactMask objects with different image shapes: " + f"{image_shape} vs {cm._image_shape}" + ) + + # list.extend is a C-level call and avoids the per-element Python + # bytecode overhead of a flat list comprehension. This matters under + # GIL contention when multiple threads call merge concurrently. + new_rles: list[npt.NDArray[np.int32]] = [] + for cm in masks_list: + new_rles.extend(cm._rles) + + # np.concatenate handles (0, 2) arrays correctly. + # No .astype() needed — _crop_shapes and _offsets are already int32. + new_crop_shapes: npt.NDArray[np.int32] = np.concatenate( + [cm._crop_shapes for cm in masks_list], axis=0 + ) + new_offsets: npt.NDArray[np.int32] = np.concatenate( + [cm._offsets for cm in masks_list], axis=0 + ) + + return CompactMask(new_rles, new_crop_shapes, new_offsets, image_shape) + + def repack(self) -> CompactMask: + """Re-encode all masks using tight bounding boxes. + + When the original ``xyxy`` boxes are padded or loose — common with + object-detector outputs and full-image boxes used in tests — each RLE + crop encodes more background (``False``) pixels than necessary. This + method decodes every crop, trims it to the minimal rectangle that + contains all ``True`` pixels, and re-encodes. All-``False`` masks are + normalised to a ``1x1`` all-``False`` crop. + + The call is O(sum of crop areas) — suitable as a one-time cleanup + after accumulating many merges (e.g. after + :class:`~supervision.detection.tools.inference_slicer.InferenceSlicer` + tiles are merged). + + Returns: + A new :class:`CompactMask` with minimal-area crops and updated + offsets. + + Examples: + ```pycon + >>> import numpy as np + >>> from supervision.detection.compact_mask import CompactMask + >>> masks = np.zeros((1, 10, 10), dtype=bool) + >>> masks[0, 3:7, 3:7] = True + >>> # Deliberately loose bbox: covers the full image. + >>> xyxy = np.array([[0, 0, 9, 9]], dtype=np.float32) + >>> cm = CompactMask.from_dense(masks, xyxy, image_shape=(10, 10)) + >>> repacked = cm.repack() + >>> repacked.offsets.tolist() # tight origin: x1=3, y1=3 + [[3, 3]] + + ``` + """ + num_masks = len(self._rles) + if num_masks == 0: + return CompactMask( + [], + np.empty((0, 2), dtype=np.int32), + np.empty((0, 2), dtype=np.int32), + self._image_shape, + ) + + new_rles: list[npt.NDArray[np.int32]] = [] + new_crop_shapes_list: list[tuple[int, int]] = [] + new_offsets_list: list[tuple[int, int]] = [] + + for mask_idx in range(num_masks): + crop = self.crop(mask_idx) + x1_off = int(self._offsets[mask_idx, 0]) + y1_off = int(self._offsets[mask_idx, 1]) + + rows_any = np.any(crop, axis=1) + cols_any = np.any(crop, axis=0) + + if not rows_any.any(): + # All-False: normalise to 1x1 to avoid zero-sized arrays. + new_rles.append(_rle_encode(np.zeros((1, 1), dtype=bool))) + new_crop_shapes_list.append((1, 1)) + new_offsets_list.append((x1_off, y1_off)) + continue + + y_indices = np.where(rows_any)[0] + x_indices = np.where(cols_any)[0] + y_min, y_max = int(y_indices[0]), int(y_indices[-1]) + x_min, x_max = int(x_indices[0]), int(x_indices[-1]) + + tight = crop[y_min : y_max + 1, x_min : x_max + 1] + new_rles.append(_rle_encode(tight)) + new_crop_shapes_list.append((y_max - y_min + 1, x_max - x_min + 1)) + new_offsets_list.append((x1_off + x_min, y1_off + y_min)) + + return CompactMask( + new_rles, + np.array(new_crop_shapes_list, dtype=np.int32), + np.array(new_offsets_list, dtype=np.int32), + self._image_shape, + ) + + # ------------------------------------------------------------------ + # Slicer support + # ------------------------------------------------------------------ + + def with_offset( + self, + dx: int, + dy: int, + new_image_shape: tuple[int, int], + ) -> CompactMask: + """Return a new :class:`CompactMask` with adjusted offsets and image shape. + + Used by :class:`~supervision.detection.tools.inference_slicer.InferenceSlicer` + to relocate tile-local masks into full-image coordinates without + materialising the dense ``(N, H, W)`` array. + + Args: + dx: Pixels to add to every mask's ``x1`` offset. + dy: Pixels to add to every mask's ``y1`` offset. + new_image_shape: ``(H, W)`` of the full (destination) image. + + Returns: + New :class:`CompactMask` with updated offsets and image shape. + Crops are clipped to stay inside ``new_image_shape``; masks fully + outside are represented as ``1x1`` all-False crops. + + Examples: + ```pycon + >>> import numpy as np + >>> from supervision.detection.compact_mask import CompactMask + >>> masks = np.zeros((1, 20, 20), dtype=bool) + >>> xyxy = np.array([[5, 5, 15, 15]], dtype=np.float32) + >>> cm = CompactMask.from_dense(masks, xyxy, image_shape=(20, 20)) + >>> cm2 = cm.with_offset(100, 200, new_image_shape=(400, 400)) + >>> cm2.offsets[0].tolist() + [105, 205] + + ``` + """ + new_h, new_w = new_image_shape + if new_h <= 0 or new_w <= 0: + raise ValueError("new_image_shape must contain positive dimensions") + + num_masks = len(self) + if num_masks == 0: + return CompactMask( + [], + np.empty((0, 2), dtype=np.int32), + np.empty((0, 2), dtype=np.int32), + new_image_shape, + ) + + # Vectorised bounds check: compute every new [x1,y1,x2,y2] at once. + # For the common case (InferenceSlicer tiles that fit fully inside the + # new canvas) this catches the "no clipping needed" path in O(N) numpy + # without touching any RLE data. + new_offsets: npt.NDArray[np.int32] = self._offsets + np.array( + [dx, dy], dtype=np.int32 + ) + x1s = new_offsets[:, 0] + y1s = new_offsets[:, 1] + x2s = x1s + self._crop_shapes[:, 1] - 1 + y2s = y1s + self._crop_shapes[:, 0] - 1 + + needs_clip: npt.NDArray[np.bool_] = ( + (x1s < 0) | (y1s < 0) | (x2s >= new_w) | (y2s >= new_h) + ) + + if not needs_clip.any(): + # Fast path: pure offset arithmetic, no decode/re-encode needed. + return CompactMask( + list(self._rles), + self._crop_shapes.copy(), + new_offsets, + new_image_shape, + ) + + # Slow path: only decode+clip+re-encode the masks that actually overflow. + out_rles: list[npt.NDArray[np.int32]] = [] + out_crop_shapes: list[tuple[int, int]] = [] + out_offsets_list: list[tuple[int, int]] = [] + + for mask_idx in range(num_masks): + x1 = int(x1s[mask_idx]) + y1 = int(y1s[mask_idx]) + x2 = int(x2s[mask_idx]) + y2 = int(y2s[mask_idx]) + + if not needs_clip[mask_idx]: + out_rles.append(self._rles[mask_idx]) + out_crop_shapes.append( + ( + int(self._crop_shapes[mask_idx, 0]), + int(self._crop_shapes[mask_idx, 1]), + ) + ) + out_offsets_list.append((x1, y1)) + continue + + ix1 = max(0, x1) + iy1 = max(0, y1) + ix2 = min(new_w - 1, x2) + iy2 = min(new_h - 1, y2) + + if ix1 > ix2 or iy1 > iy2: + anchor_x = min(max(x1, 0), new_w - 1) + anchor_y = min(max(y1, 0), new_h - 1) + out_rles.append(_rle_encode(np.zeros((1, 1), dtype=bool))) + out_crop_shapes.append((1, 1)) + out_offsets_list.append((anchor_x, anchor_y)) + continue + + crop = self.crop(mask_idx) + clipped = crop[iy1 - y1 : iy2 - y1 + 1, ix1 - x1 : ix2 - x1 + 1] + out_rles.append(_rle_encode(clipped)) + out_crop_shapes.append((iy2 - iy1 + 1, ix2 - ix1 + 1)) + out_offsets_list.append((ix1, iy1)) + + return CompactMask( + out_rles, + np.array(out_crop_shapes, dtype=np.int32), + np.array(out_offsets_list, dtype=np.int32), + new_image_shape, + ) diff --git a/src/supervision/detection/core.py b/src/supervision/detection/core.py index baabc56854..3798ee547b 100644 --- a/src/supervision/detection/core.py +++ b/src/supervision/detection/core.py @@ -4,7 +4,7 @@ from dataclasses import dataclass, field from enum import Enum from functools import reduce -from typing import Any, cast +from typing import TYPE_CHECKING, Any, cast import numpy as np import numpy.typing as npt @@ -59,6 +59,9 @@ from supervision.utils.internal import deprecated, get_instance_variables from supervision.validators import validate_detections_fields, validate_resolution +if TYPE_CHECKING: + from supervision.detection.compact_mask import CompactMask + @dataclass class Detections: @@ -133,7 +136,8 @@ class simplifies data manipulation and filtering, providing a uniform API for xyxy: An array of shape `(n, 4)` containing the bounding boxes coordinates in format `[x1, y1, x2, y2]` mask: An array of shape `(n, H, W)` containing the segmentation masks - (`bool` data type), or `None` when masks are not available. + (`bool` data type), or `None` when masks are not available, or as + :class:`~supervision.detection.compact_mask.CompactMask`. confidence: An array of shape `(n,)` containing the confidence scores of the detections, or `None` when confidence values are not available. class_id: An array of shape `(n,)` containing the class ids of the @@ -149,7 +153,7 @@ class simplifies data manipulation and filtering, providing a uniform API for """ # noqa: E501 // docs xyxy: npt.NDArray[np.generic] - mask: npt.NDArray[np.generic] | None = None + mask: npt.NDArray[np.generic] | CompactMask | None = None confidence: npt.NDArray[np.generic] | None = None class_id: npt.NDArray[np.generic] | None = None tracker_id: npt.NDArray[np.generic] | None = None @@ -2073,6 +2077,11 @@ def is_empty(self) -> bool: """ Returns `True` if the `Detections` object is considered empty. """ + # Fast path: avoids __eq__ which calls np.array_equal(to_dense(), ...) + # and would materialise the entire (N, H, W) CompactMask to a dense + # array just to check emptiness — O(N·H·W) for an O(1) check. + if len(self.xyxy) > 0: + return False empty_detections = Detections.empty() empty_detections.data = self.data empty_detections.metadata = self.metadata @@ -2150,16 +2159,22 @@ def merge(cls, detections_list: list[Detections]) -> Detections: xyxy = np.vstack([d.xyxy for d in detections_list]) - def stack_or_none(name: str) -> npt.NDArray[np.generic] | None: + def stack_or_none( + name: str, + ) -> npt.NDArray[np.generic] | CompactMask | None: if all(d.__getattribute__(name) is None for d in detections_list): return None if any(d.__getattribute__(name) is None for d in detections_list): raise ValueError(f"All or none of the '{name}' fields must be None") - return ( - np.vstack([d.__getattribute__(name) for d in detections_list]) - if name == "mask" - else np.hstack([d.__getattribute__(name) for d in detections_list]) - ) + if name == "mask": + from supervision.detection.compact_mask import CompactMask + + masks = [d.__getattribute__(name) for d in detections_list] + if all(isinstance(m, CompactMask) for m in masks): + return CompactMask.merge(masks) + # Mixed or all-ndarray: __array__ auto-converts any CompactMask. + return np.vstack([np.asarray(m) for m in masks]) + return np.hstack([d.__getattribute__(name) for d in detections_list]) mask = stack_or_none("mask") confidence = stack_or_none("confidence") @@ -2281,7 +2296,7 @@ def __getitem__( """ if isinstance(index, str): return self.data.get(index) - if self.is_empty(): + if len(self) == 0: return self if isinstance(index, int): index = [index] @@ -2343,6 +2358,10 @@ def area(self) -> npt.NDArray[np.generic]: where n is the number of detections. """ if self.mask is not None: + from supervision.detection.compact_mask import CompactMask + + if isinstance(self.mask, CompactMask): + return self.mask.area return np.array([np.sum(mask) for mask in self.mask]) else: return self.box_area diff --git a/src/supervision/detection/tools/inference_slicer.py b/src/supervision/detection/tools/inference_slicer.py index 4e0fcbf87e..79927641fd 100644 --- a/src/supervision/detection/tools/inference_slicer.py +++ b/src/supervision/detection/tools/inference_slicer.py @@ -45,9 +45,19 @@ def move_detections( "Resolution width and height are required for moving segmentation " "detections. This should be the same as (width, height) of image shape." ) - detections.mask = move_masks( - masks=detections.mask, offset=offset, resolution_wh=resolution_wh - ) + from supervision.detection.compact_mask import CompactMask + + if isinstance(detections.mask, CompactMask): + # Preserve move_masks clipping semantics without dense materialisation. + detections.mask = detections.mask.with_offset( + dx=int(offset[0]), + dy=int(offset[1]), + new_image_shape=(resolution_wh[1], resolution_wh[0]), + ) + else: + detections.mask = move_masks( + masks=detections.mask, offset=offset, resolution_wh=resolution_wh + ) return detections @@ -74,6 +84,15 @@ class InferenceSlicer: iou_threshold: IOU threshold used in merging overlap filtering. overlap_metric: Metric to compute overlap (`IOU` or `IOS`). thread_workers: Number of threads for concurrent slice inference. + compact_masks: If ``True``, dense ``(N, H, W)`` boolean mask + arrays returned by the callback are immediately converted to a + :class:`~supervision.detection.compact_mask.CompactMask`. This + keeps masks in run-length-encoded form for the entire pipeline — + merge, NMS, and annotation — avoiding the large ``(N, H, W)`` + allocations that cause OOM on high-resolution images with many + objects. IoU and NMS are computed directly on the RLE crops + without ever materialising a full ``(N, H, W)`` array. + Defaults to ``False`` for backward compatibility. Raises: ValueError: If `slice_wh` or `overlap_wh` are invalid or inconsistent. @@ -122,6 +141,7 @@ def __init__( iou_threshold: float = 0.5, overlap_metric: OverlapMetric | str = OverlapMetric.IOU, thread_workers: int = 1, + compact_masks: bool = False, ): slice_wh_norm = self._normalize_slice_wh(slice_wh) overlap_wh_norm = self._normalize_overlap_wh(overlap_wh) @@ -135,6 +155,7 @@ def __init__( self.overlap_filter = OverlapFilter.from_value(overlap_filter) self.callback: Callable[[ImageType], Detections] = callback self.thread_workers = thread_workers + self.compact_masks = compact_masks def __call__(self, image: ImageType) -> Detections: """ @@ -196,8 +217,22 @@ def _run_callback(self, image: ImageType, offset: npt.NDArray[Any]) -> Detection """ image_slice = crop_image(image=image, xyxy=offset) detections = self.callback(image_slice) - resolution_wh = get_image_resolution_wh(image) + if ( + self.compact_masks + and detections.mask is not None + and isinstance(detections.mask, np.ndarray) + ): + from supervision.detection.compact_mask import CompactMask + + slice_w, slice_h = get_image_resolution_wh(image_slice) + detections.mask = CompactMask.from_dense( + detections.mask, + detections.xyxy, + image_shape=(slice_h, slice_w), + ) + + resolution_wh = get_image_resolution_wh(image) detections = move_detections( detections=detections, offset=offset[:2], diff --git a/src/supervision/detection/utils/iou_and_nms.py b/src/supervision/detection/utils/iou_and_nms.py index d7c04f5f1d..8b37108320 100644 --- a/src/supervision/detection/utils/iou_and_nms.py +++ b/src/supervision/detection/utils/iou_and_nms.py @@ -30,7 +30,7 @@ class OverlapFilter(Enum): @classmethod def list(cls) -> list[str]: - return list(map(lambda c: c.value, cls)) + return list(map(lambda member: member.value, cls)) @classmethod def from_value(cls, value: OverlapFilter | str) -> OverlapFilter: @@ -66,7 +66,7 @@ class OverlapMetric(Enum): @classmethod def list(cls) -> list[str]: - return list(map(lambda c: c.value, cls)) + return list(map(lambda member: member.value, cls)) @classmethod def from_value(cls, value: OverlapMetric | str) -> OverlapMetric: @@ -351,9 +351,9 @@ def box_iou_batch_with_jaccard( ious: npt.NDArray[np.float64] = np.zeros( (len(boxes_detection), len(boxes_true)), dtype=np.float64 ) - for g_idx, g in enumerate(boxes_true): - for d_idx, d in enumerate(boxes_detection): - ious[d_idx, g_idx] = _jaccard(d, g, is_crowd[g_idx]) + for gt_idx, gt_box in enumerate(boxes_true): + for det_idx, det_box in enumerate(boxes_detection): + ious[det_idx, gt_idx] = _jaccard(det_box, gt_box, is_crowd[gt_idx]) return ious @@ -385,19 +385,124 @@ def oriented_box_iou_batch( max_width = int(max(boxes_true[:, :, 1].max(), boxes_detection[:, :, 1].max()) + 1) mask_true = np.zeros((boxes_true.shape[0], max_height, max_width), dtype=np.uint8) - for i, box_true in enumerate(boxes_true): - mask_true[i] = polygon_to_mask(box_true, (max_width, max_height)) + for box_idx, box_true in enumerate(boxes_true): + mask_true[box_idx] = polygon_to_mask(box_true, (max_width, max_height)) mask_detection = np.zeros( (boxes_detection.shape[0], max_height, max_width), dtype=np.uint8 ) - for i, box_detection in enumerate(boxes_detection): - mask_detection[i] = polygon_to_mask(box_detection, (max_width, max_height)) + for box_idx, box_detection in enumerate(boxes_detection): + mask_detection[box_idx] = polygon_to_mask( + box_detection, (max_width, max_height) + ) ious = mask_iou_batch(mask_true, mask_detection) return ious +def compact_mask_iou_batch( + masks_true: Any, + masks_detection: Any, + overlap_metric: OverlapMetric = OverlapMetric.IOU, +) -> npt.NDArray[np.floating]: + """Compute pairwise overlap between two :class:`CompactMask` collections. + + Avoids materialising full ``(N, H, W)`` arrays by: + + 1. Vectorised bounding-box pre-filter — pairs whose boxes do not overlap + get IoU = 0 without any mask decoding. + 2. Sub-crop decoding — for overlapping pairs, only the intersection region + of each crop is decoded and compared. + 3. Crop caching — each individual crop is decoded at most once even when it + participates in many pairs. + + The result is numerically identical to running the dense + :func:`mask_iou_batch` on ``np.asarray(masks_true)`` / + ``np.asarray(masks_detection)``. + + Args: + masks_true: :class:`~supervision.detection.compact_mask.CompactMask` + holding the ground-truth masks. + masks_detection: :class:`~supervision.detection.compact_mask.CompactMask` + holding the detection masks. + overlap_metric: :class:`OverlapMetric` — ``IOU`` or ``IOS``. + + Returns: + Float array of shape ``(N1, N2)`` with pairwise overlap values. + """ + n1: int = len(masks_true) + n2: int = len(masks_detection) + result: npt.NDArray[np.floating] = np.zeros((n1, n2), dtype=float) + + if n1 == 0 or n2 == 0: + return result + + areas_a: npt.NDArray[np.int64] = masks_true.area + areas_b: npt.NDArray[np.int64] = masks_detection.area + + # Inclusive per-mask bounding boxes obtained from public accessors. + # bbox_xyxy: (N, 4) → (x1, y1, x2, y2) + bboxes_a: npt.NDArray[np.int32] = masks_true.bbox_xyxy.astype(np.int32) + x1a: npt.NDArray[np.int32] = bboxes_a[:, 0] + y1a: npt.NDArray[np.int32] = bboxes_a[:, 1] + x2a: npt.NDArray[np.int32] = bboxes_a[:, 2] + y2a: npt.NDArray[np.int32] = bboxes_a[:, 3] + + bboxes_b: npt.NDArray[np.int32] = masks_detection.bbox_xyxy.astype(np.int32) + x1b: npt.NDArray[np.int32] = bboxes_b[:, 0] + y1b: npt.NDArray[np.int32] = bboxes_b[:, 1] + x2b: npt.NDArray[np.int32] = bboxes_b[:, 2] + y2b: npt.NDArray[np.int32] = bboxes_b[:, 3] + + # Pairwise intersection bounding box — shape (N1, N2). + ix1: npt.NDArray[np.int32] = np.maximum(x1a[:, None], x1b[None, :]) + iy1: npt.NDArray[np.int32] = np.maximum(y1a[:, None], y1b[None, :]) + ix2: npt.NDArray[np.int32] = np.minimum(x2a[:, None], x2b[None, :]) + iy2: npt.NDArray[np.int32] = np.minimum(y2a[:, None], y2b[None, :]) + bbox_overlap: npt.NDArray[np.bool_] = (ix1 <= ix2) & (iy1 <= iy2) + + # Decode each crop at most once, even if it participates in many pairs. + crops_a: dict[int, npt.NDArray[np.bool_]] = {} + crops_b: dict[int, npt.NDArray[np.bool_]] = {} + + for idx_pair in np.argwhere(bbox_overlap): + idx_a, idx_b = int(idx_pair[0]), int(idx_pair[1]) + + if idx_a not in crops_a: + crops_a[idx_a] = masks_true.crop(idx_a) + if idx_b not in crops_b: + crops_b[idx_b] = masks_detection.crop(idx_b) + + lx1 = int(ix1[idx_a, idx_b]) + ly1 = int(iy1[idx_a, idx_b]) + lx2 = int(ix2[idx_a, idx_b]) + ly2 = int(iy2[idx_a, idx_b]) + + ox_a, oy_a = int(x1a[idx_a]), int(y1a[idx_a]) + sub_a = crops_a[idx_a][ly1 - oy_a : ly2 - oy_a + 1, lx1 - ox_a : lx2 - ox_a + 1] + + ox_b, oy_b = int(x1b[idx_b]), int(y1b[idx_b]) + sub_b = crops_b[idx_b][ly1 - oy_b : ly2 - oy_b + 1, lx1 - ox_b : lx2 - ox_b + 1] + + inter = int(np.logical_and(sub_a, sub_b).sum()) + area_a_i = int(areas_a[idx_a]) + area_b_j = int(areas_b[idx_b]) + + if overlap_metric == OverlapMetric.IOU: + union = area_a_i + area_b_j - inter + result[idx_a, idx_b] = inter / union if union > 0 else 0.0 + elif overlap_metric == OverlapMetric.IOS: + small = min(area_a_i, area_b_j) + result[idx_a, idx_b] = inter / small if small > 0 else 0.0 + else: + raise ValueError( + f"overlap_metric {overlap_metric} is not supported, " + "only 'IOU' and 'IOS' are supported" + ) + + return result + + def _mask_iou_batch_split( masks_true: npt.NDArray[Any], masks_detection: npt.NDArray[Any], @@ -461,16 +566,34 @@ def mask_iou_batch( Compute Intersection over Union (IoU) of two sets of masks - `masks_true` and `masks_detection`. + Accepts both dense ``(N, H, W)`` boolean arrays and + :class:`~supervision.detection.compact_mask.CompactMask` objects. + When both inputs are :class:`~supervision.detection.compact_mask.CompactMask`, + the computation uses :func:`compact_mask_iou_batch` to avoid materialising + full ``(N, H, W)`` arrays. + Args: masks_true: 3D `np.ndarray` representing ground-truth masks. masks_detection: 3D `np.ndarray` representing detection masks. overlap_metric: Metric used to compute the degree of overlap between pairs of masks (e.g., IoU, IoS). memory_limit: Memory limit in MB, default is 1024 * 5 MB (5GB). + Ignored when both inputs are CompactMask. Returns: Pairwise IoU of masks from `masks_true` and `masks_detection`. """ + from supervision.detection.compact_mask import CompactMask + + if isinstance(masks_true, CompactMask) and isinstance(masks_detection, CompactMask): + return compact_mask_iou_batch(masks_true, masks_detection, overlap_metric) + + # Materialise any CompactMask that was passed alongside a dense array. + if isinstance(masks_true, CompactMask): + masks_true = np.asarray(masks_true) + if isinstance(masks_detection, CompactMask): + masks_detection = np.asarray(masks_detection) + memory = ( masks_true.shape[0] * masks_true.shape[1] @@ -494,10 +617,12 @@ def mask_iou_batch( ), 1, ) - for i in range(0, masks_true.shape[0], step): + for chunk_start in range(0, masks_true.shape[0], step): ious.append( _mask_iou_batch_split( - masks_true[i : i + step], masks_detection, overlap_metric + masks_true[chunk_start : chunk_start + step], + masks_detection, + overlap_metric, ) ) @@ -514,6 +639,11 @@ def mask_non_max_suppression( """ Perform Non-Maximum Suppression (NMS) on segmentation predictions. + IoU is computed exactly on the full-resolution masks for both dense and + :class:`~supervision.detection.compact_mask.CompactMask` inputs. The + ``mask_dimension`` parameter is kept for backward compatibility but is no + longer used — dense masks are **not** resized before IoU computation. + Args: predictions: A 2D array of object detection predictions in the format of `(x_min, y_min, x_max, y_max, score)` @@ -526,8 +656,8 @@ def mask_non_max_suppression( to use for non-maximum suppression. overlap_metric: Metric used to compute the degree of overlap between pairs of masks (e.g., IoU, IoS). - mask_dimension: The dimension to which the masks should be - resized before computing IOU values. Defaults to 640. + mask_dimension: Deprecated, no longer used. Kept for backward + compatibility. Returns: A boolean array indicating which predictions to keep after @@ -549,15 +679,19 @@ def mask_non_max_suppression( sort_index = predictions[:, 4].argsort()[::-1] predictions = predictions[sort_index] masks = masks[sort_index] - masks_resized = resize_masks(masks, mask_dimension) - ious = mask_iou_batch(masks_resized, masks_resized, overlap_metric) + + ious = mask_iou_batch(masks, masks, overlap_metric) categories = predictions[:, 5] keep = np.ones(rows, dtype=bool) - for i in range(rows): - if keep[i]: - condition = (ious[i] > iou_threshold) & (categories[i] == categories) - keep[i + 1 :] = np.where(condition[i + 1 :], False, keep[i + 1 :]) + for row_idx in range(rows): + if keep[row_idx]: + condition = (ious[row_idx] > iou_threshold) & ( + categories[row_idx] == categories + ) + keep[row_idx + 1 :] = np.where( + condition[row_idx + 1 :], False, keep[row_idx + 1 :] + ) return cast(npt.NDArray[np.bool_], keep[sort_index.argsort()]) @@ -712,7 +846,20 @@ def mask_non_max_merge( AssertionError: If `iou_threshold` is not within the closed range from `0` to `1`. """ - masks_resized = resize_masks(masks, mask_dimension) + from supervision.detection.compact_mask import CompactMask + + if isinstance(masks, CompactMask): + # _group_overlapping_masks needs dense arrays for logical_or union merging. + # Note: np.asarray(masks) first materialises a full-resolution (N, H, W) + # dense array before downscaling with resize_masks. This reduces the size + # of the array used for overlap computation but does not avoid the initial + # full-frame materialisation, which may still be memory-intensive for very + # large images or object counts. + masks = resize_masks(np.asarray(masks), mask_dimension) + else: + masks = resize_masks(masks, mask_dimension) + masks_resized = masks + if predictions.shape[1] == 5: return _group_overlapping_masks( predictions, masks_resized, iou_threshold, overlap_metric diff --git a/src/supervision/detection/utils/masks.py b/src/supervision/detection/utils/masks.py index a618556ed0..018cbd4948 100644 --- a/src/supervision/detection/utils/masks.py +++ b/src/supervision/detection/utils/masks.py @@ -1,11 +1,14 @@ from __future__ import annotations -from typing import Any, Literal, cast +from typing import TYPE_CHECKING, Any, Literal, cast import cv2 import numpy as np import numpy.typing as npt +if TYPE_CHECKING: + from supervision.detection.compact_mask import CompactMask + def move_masks( masks: npt.NDArray[np.bool_], @@ -86,7 +89,7 @@ def move_masks( def calculate_masks_centroids( - masks: npt.NDArray[Any], + masks: npt.NDArray[Any] | CompactMask, ) -> npt.NDArray[np.int_]: """ Calculate the centroids of binary masks in a tensor. @@ -94,11 +97,38 @@ def calculate_masks_centroids( Args: masks: A 3D NumPy array of shape (num_masks, height, width). Each 2D array in the tensor represents a binary mask. + Also accepts a :class:`~supervision.detection.compact_mask.CompactMask`. Returns: A 2D NumPy array of shape (num_masks, 2), where each row contains the x and y coordinates (in that order) of the centroid of the corresponding mask. """ + from supervision.detection.compact_mask import CompactMask + + if isinstance(masks, CompactMask): + # Compute centroids per-crop to avoid materialising the full (N, H, W) array. + n = len(masks) + if n == 0: + return cast(npt.NDArray[np.int_], np.empty((0, 2), dtype=int)) + + centroids: npt.NDArray[np.float64] = np.zeros((n, 2), dtype=np.float64) + for i in range(n): + crop = masks.crop(i) + crop_h, crop_w = crop.shape + x1 = int(masks.offsets[i, 0]) + y1 = int(masks.offsets[i, 1]) + total = int(crop.sum()) + if total == 0: + centroids[i] = [0.0, 0.0] + continue + # Match the +0.5 offset used by the dense implementation. + crop_rows, crop_cols = np.indices((crop_h, crop_w)) + cx = float(np.sum((crop_cols + 0.5)[crop])) / total + x1 + cy = float(np.sum((crop_rows + 0.5)[crop])) / total + y1 + centroids[i] = [cx, cy] + + return cast(npt.NDArray[np.int_], centroids.astype(int)) + _num_masks, height, width = masks.shape total_pixels = masks.sum(axis=(1, 2)) @@ -339,7 +369,7 @@ def filter_segments_by_distance( ``` - The nearby 2×2 block at columns 6–7 is kept because its edge distance + The nearby 2x2 block at columns 6-7 is kept because its edge distance is within 3 pixels. The distant block at columns 9-10 is removed. """ # noqa E501 // docs if mask.dtype != bool: diff --git a/src/supervision/metrics/utils/object_size.py b/src/supervision/metrics/utils/object_size.py index ad9f37b56f..84482580a0 100644 --- a/src/supervision/metrics/utils/object_size.py +++ b/src/supervision/metrics/utils/object_size.py @@ -10,6 +10,7 @@ from supervision.metrics.core import MetricTarget if TYPE_CHECKING: + from supervision.detection.compact_mask import CompactMask from supervision.detection.core import Detections SIZE_THRESHOLDS = (32**2, 96**2) @@ -122,12 +123,15 @@ def get_bbox_size_category(xyxy: npt.NDArray[np.float32]) -> npt.NDArray[np.int_ return result -def get_mask_size_category(mask: npt.NDArray[np.bool_]) -> npt.NDArray[np.int_]: +def get_mask_size_category( + mask: npt.NDArray[np.bool_] | CompactMask, +) -> npt.NDArray[np.int_]: """ Get the size category of detection masks. Args: - mask: The mask array shaped (N, H, W). + mask: The mask array shaped (N, H, W), or a + :class:`~supervision.detection.compact_mask.CompactMask`. Returns: The size category of each mask, matching @@ -146,10 +150,14 @@ def get_mask_size_category(mask: npt.NDArray[np.bool_]) -> npt.NDArray[np.int_]: ``` """ - if len(mask.shape) != 3: - raise ValueError("Masks must be shaped (N, H, W)") - - areas = np.sum(mask, axis=(1, 2)) + from supervision.detection.compact_mask import CompactMask + + if isinstance(mask, CompactMask): + areas = mask.area + else: + if len(mask.shape) != 3: + raise ValueError("Masks must be shaped (N, H, W)") + areas = np.sum(mask, axis=(1, 2)) result = np.full(areas.shape, ObjectSizeCategory.ANY.value) SM, LG = SIZE_THRESHOLDS diff --git a/src/supervision/validators/__init__.py b/src/supervision/validators/__init__.py index 1ab5449d11..75e200e72b 100644 --- a/src/supervision/validators/__init__.py +++ b/src/supervision/validators/__init__.py @@ -27,6 +27,14 @@ def validate_mask(mask: Any, n: int) -> None: if mask is None: return + # Fast path: CompactMask only needs a length check. + from supervision.detection.compact_mask import CompactMask + + if isinstance(mask, CompactMask): + if len(mask) != n: + raise ValueError(f"mask must contain {n} masks, but got {len(mask)}") + return + expected_shape = f"({n}, H, W)" actual_shape = str(getattr(mask, "shape", None)) actual_dtype = getattr(mask, "dtype", None) diff --git a/tests/detection/test_compact_mask.py b/tests/detection/test_compact_mask.py new file mode 100644 index 0000000000..cb4e96730c --- /dev/null +++ b/tests/detection/test_compact_mask.py @@ -0,0 +1,936 @@ +"""Unit tests for CompactMask and its private RLE helpers.""" + +from __future__ import annotations + +from contextlib import ExitStack as DoesNotRaise + +import numpy as np +import pytest + +from supervision.detection.compact_mask import ( + CompactMask, + _rle_area, + _rle_decode, + _rle_encode, +) +from supervision.detection.utils.converters import mask_to_xyxy +from supervision.detection.utils.masks import ( + calculate_masks_centroids, + contains_holes, + contains_multiple_segments, + move_masks, +) + + +def _make_cm(masks: np.ndarray, image_shape: tuple[int, int]) -> CompactMask: + """Build a CompactMask whose crops equal the full bounding-box extents.""" + num_masks = len(masks) + img_h, img_w = image_shape + xyxy = np.tile(np.array([0, 0, img_w, img_h], dtype=np.float32), (num_masks, 1)) + return CompactMask.from_dense(masks, xyxy, image_shape=image_shape) + + +class TestRleHelpers: + """Tests for _rle_encode, _rle_decode, and _rle_area. + + Verifies that the private RLE encoding round-trips correctly for a range + of mask shapes (all-False, all-True, diagonal, L-shape, checkerboard, + single-pixel, and empty), and that _rle_area matches np.sum on the + original boolean array. + """ + + @pytest.mark.parametrize( + ("mask_2d", "description"), + [ + (np.zeros((5, 5), dtype=bool), "all-False"), + (np.ones((5, 5), dtype=bool), "all-True"), + (np.eye(4, dtype=bool), "diagonal"), + ( + np.array([[True, True, False], [True, False, False]], dtype=bool), + "L-shape", + ), + ( + np.indices((4, 4)).sum(axis=0) % 2 == 0, + "checkerboard", + ), + (np.zeros((1, 1), dtype=bool), "single-pixel-False"), + (np.ones((1, 1), dtype=bool), "single-pixel-True"), + (np.zeros((0, 0), dtype=bool), "empty"), + ], + ) + def test_encode_decode_round_trip( + self, mask_2d: np.ndarray, description: str + ) -> None: + if mask_2d.size == 0: + rle = _rle_encode(mask_2d) + assert _rle_area(rle) == 0 + return + + rle = _rle_encode(mask_2d) + assert rle.dtype == np.int32, "RLE must be int32" + reconstructed = _rle_decode(rle, mask_2d.shape[0], mask_2d.shape[1]) + np.testing.assert_array_equal( + reconstructed, mask_2d, err_msg=f"Round-trip failed for: {description}" + ) + + @pytest.mark.parametrize( + "mask_2d", + [ + np.zeros((6, 6), dtype=bool), + np.ones((6, 6), dtype=bool), + np.eye(6, dtype=bool), + np.array([[True, False, True], [False, True, False]], dtype=bool), + ], + ) + def test_area_matches_numpy_sum(self, mask_2d: np.ndarray) -> None: + rle = _rle_encode(mask_2d) + assert _rle_area(rle) == int(np.sum(mask_2d)) + + +class TestFromDenseToDense: + """Tests for CompactMask.from_dense and to_dense. + + Verifies that the from_dense → to_dense round-trip is lossless when the + bounding boxes span the full image (no True pixels fall outside the crop). + Covers N=0 (empty), N=1 (single mask), and N=5 (several random masks). + """ + + @pytest.mark.parametrize( + ("num_masks", "image_shape"), + [ + (0, (50, 50)), + (1, (50, 50)), + (5, (50, 50)), + ], + ) + def test_round_trip(self, num_masks: int, image_shape: tuple[int, int]) -> None: + rng = np.random.default_rng(42) + img_h, img_w = image_shape + masks = rng.integers(0, 2, size=(num_masks, img_h, img_w)).astype(bool) + cm = _make_cm(masks, image_shape) + np.testing.assert_array_equal(cm.to_dense(), masks) + + def test_round_trip_with_mask_to_xyxy(self) -> None: + """Round-trip must be lossless with inclusive xyxy from mask_to_xyxy.""" + img_h, img_w = 12, 14 + masks = np.zeros((1, img_h, img_w), dtype=bool) + masks[0, 3:7, 4:9] = True # non-full-image object + + xyxy = mask_to_xyxy(masks).astype(np.float32) + cm = CompactMask.from_dense(masks, xyxy, image_shape=(img_h, img_w)) + + np.testing.assert_array_equal(cm.to_dense(), masks) + + +class TestGetItem: + """Tests for CompactMask.__getitem__. + + Covers four indexing modes: + - Integer index → dense (H, W) np.ndarray with correct shape and dtype. + - List of indices → new CompactMask with the selected detections. + - Slice → new CompactMask with the sliced detections. + - Boolean ndarray → new CompactMask filtered by the boolean selector. + """ + + def test_int_returns_2d_dense(self) -> None: + img_h, img_w = 30, 40 + rng = np.random.default_rng(0) + masks = rng.integers(0, 2, size=(3, img_h, img_w)).astype(bool) + cm = _make_cm(masks, (img_h, img_w)) + + result = cm[1] + assert isinstance(result, np.ndarray) + assert result.shape == (img_h, img_w) + assert result.dtype == bool + np.testing.assert_array_equal(result, masks[1]) + + def test_list_returns_compact_mask(self) -> None: + img_h, img_w = 20, 20 + masks = np.zeros((4, img_h, img_w), dtype=bool) + for mask_idx in range(4): + masks[ + mask_idx, + mask_idx * 2 : mask_idx * 2 + 2, + mask_idx * 2 : mask_idx * 2 + 2, + ] = True + cm = _make_cm(masks, (img_h, img_w)) + + subset = cm[[0, 2]] + assert isinstance(subset, CompactMask) + assert len(subset) == 2 + np.testing.assert_array_equal(subset[0], masks[0]) + np.testing.assert_array_equal(subset[1], masks[2]) + + def test_slice_returns_compact_mask(self) -> None: + img_h, img_w = 20, 20 + masks = np.zeros((5, img_h, img_w), dtype=bool) + cm = _make_cm(masks, (img_h, img_w)) + + subset = cm[1:4] + assert isinstance(subset, CompactMask) + assert len(subset) == 3 + + def test_bool_ndarray(self) -> None: + img_h, img_w = 15, 15 + rng = np.random.default_rng(7) + masks = rng.integers(0, 2, size=(4, img_h, img_w)).astype(bool) + cm = _make_cm(masks, (img_h, img_w)) + + selector = np.array([True, False, True, False]) + subset = cm[selector] + assert isinstance(subset, CompactMask) + assert len(subset) == 2 + np.testing.assert_array_equal(subset[0], masks[0]) + np.testing.assert_array_equal(subset[1], masks[2]) + + def test_bool_list(self) -> None: + """Python list[bool] should behave like boolean masking.""" + img_h, img_w = 15, 15 + rng = np.random.default_rng(8) + masks = rng.integers(0, 2, size=(4, img_h, img_w)).astype(bool) + cm = _make_cm(masks, (img_h, img_w)) + + subset = cm[[True, False, True, False]] + assert isinstance(subset, CompactMask) + assert len(subset) == 2 + np.testing.assert_array_equal(subset[0], masks[0]) + np.testing.assert_array_equal(subset[1], masks[2]) + + +class TestProperties: + """Tests for len, shape, dtype, and area properties. + + Verifies that the shape tuple follows the (N, H, W) dense convention, + dtype is always bool, and area returns per-mask True-pixel counts that + match np.sum on the corresponding dense masks. + """ + + def test_len(self) -> None: + masks = np.zeros((3, 10, 10), dtype=bool) + cm = _make_cm(masks, (10, 10)) + assert len(cm) == 3 + + def test_shape(self) -> None: + masks = np.zeros((3, 10, 10), dtype=bool) + cm = _make_cm(masks, (10, 10)) + assert cm.shape == (3, 10, 10) + + def test_shape_empty(self) -> None: + cm = CompactMask( + [], + np.empty((0, 2), dtype=np.int32), + np.empty((0, 2), dtype=np.int32), + (480, 640), + ) + assert cm.shape == (0, 480, 640) + + def test_dtype(self) -> None: + cm = _make_cm(np.zeros((1, 5, 5), dtype=bool), (5, 5)) + assert cm.dtype == np.dtype(bool) + + def test_area_matches_dense(self) -> None: + img_h, img_w = 20, 20 + rng = np.random.default_rng(3) + masks = rng.integers(0, 2, size=(4, img_h, img_w)).astype(bool) + cm = _make_cm(masks, (img_h, img_w)) + + expected = np.array([mask.sum() for mask in masks]) + np.testing.assert_array_equal(cm.area, expected) + + def test_area_empty(self) -> None: + cm = CompactMask( + [], + np.empty((0, 2), dtype=np.int32), + np.empty((0, 2), dtype=np.int32), + (10, 10), + ) + assert cm.area.shape == (0,) + + +class TestCrop: + """Tests for CompactMask.crop. + + Verifies that crop(index) returns an array shaped (crop_h, crop_w) + containing only the pixels within the bounding box, without allocating + the full (H, W) image. + """ + + def test_returns_crop_shape(self) -> None: + img_h, img_w = 50, 60 + masks = np.zeros((1, img_h, img_w), dtype=bool) + masks[0, 10:30, 5:25] = True # 20 x 20 region + xyxy = np.array([[5, 10, 24, 29]], dtype=np.float32) + cm = CompactMask.from_dense(masks, xyxy, image_shape=(img_h, img_w)) + + crop = cm.crop(0) + assert crop.shape == (20, 20) + assert crop.all() # the entire crop should be True + + +class TestArrayProtocol: + """Tests for the __array__ protocol. + + Verifies that np.asarray(cm) materialises the full (N, H, W) dense array + and that optional dtype casting (e.g. to uint8) is correctly applied. + """ + + def test_array_protocol(self) -> None: + img_h, img_w = 10, 10 + rng = np.random.default_rng(9) + masks = rng.integers(0, 2, size=(2, img_h, img_w)).astype(bool) + cm = _make_cm(masks, (img_h, img_w)) + + arr = np.asarray(cm) + assert arr.shape == (2, img_h, img_w) + np.testing.assert_array_equal(arr, masks) + + def test_dtype_cast(self) -> None: + masks = np.ones((1, 5, 5), dtype=bool) + cm = _make_cm(masks, (5, 5)) + arr = np.asarray(cm, dtype=np.uint8) + assert arr.dtype == np.uint8 + assert arr.sum() == 25 + + +class TestMerge: + """Tests for CompactMask.merge. + + Verifies that multiple CompactMask instances with the same image_shape + can be concatenated into a single CompactMask, that merging with an empty + instance works, that an empty input list raises ValueError, and that + mismatched image shapes raise ValueError. + """ + + def test_merge(self) -> None: + img_h, img_w = 20, 20 + masks1 = np.zeros((2, img_h, img_w), dtype=bool) + masks2 = np.zeros((3, img_h, img_w), dtype=bool) + cm1 = _make_cm(masks1, (img_h, img_w)) + cm2 = _make_cm(masks2, (img_h, img_w)) + + merged = CompactMask.merge([cm1, cm2]) + assert len(merged) == 5 + assert merged.shape == (5, img_h, img_w) + np.testing.assert_array_equal( + merged.to_dense(), np.concatenate([masks1, masks2], axis=0) + ) + + def test_merge_with_empty(self) -> None: + img_h, img_w = 10, 10 + empty_cm = CompactMask( + [], + np.empty((0, 2), dtype=np.int32), + np.empty((0, 2), dtype=np.int32), + (img_h, img_w), + ) + masks = np.zeros((2, img_h, img_w), dtype=bool) + cm = _make_cm(masks, (img_h, img_w)) + + merged = CompactMask.merge([empty_cm, cm]) + assert len(merged) == 2 + + def test_merge_empty_list_raises(self) -> None: + with pytest.raises(ValueError, match="empty list"): + CompactMask.merge([]) + + def test_merge_mismatched_image_shape_raises(self) -> None: + cm1 = CompactMask( + [], + np.empty((0, 2), dtype=np.int32), + np.empty((0, 2), dtype=np.int32), + (10, 10), + ) + cm2 = CompactMask( + [], + np.empty((0, 2), dtype=np.int32), + np.empty((0, 2), dtype=np.int32), + (20, 20), + ) + with pytest.raises(ValueError, match="image shapes"): + CompactMask.merge([cm1, cm2]) + + +class TestEquality: + """Tests for CompactMask.__eq__. + + Verifies element-wise equality between two CompactMask instances and + between a CompactMask and an equivalent dense (N, H, W) boolean array. + """ + + def test_eq_identical(self) -> None: + masks = np.zeros((2, 10, 10), dtype=bool) + masks[0, 2:5, 2:5] = True + cm1 = _make_cm(masks, (10, 10)) + cm2 = _make_cm(masks, (10, 10)) + assert cm1 == cm2 + + def test_eq_different(self) -> None: + masks_a = np.zeros((2, 10, 10), dtype=bool) + masks_a[0, 2:5, 2:5] = True + masks_b = np.zeros((2, 10, 10), dtype=bool) + masks_b[1, 6:9, 6:9] = True + cm1 = _make_cm(masks_a, (10, 10)) + cm2 = _make_cm(masks_b, (10, 10)) + assert not (cm1 == cm2) + + def test_eq_with_dense_array(self) -> None: + masks = np.zeros((1, 8, 8), dtype=bool) + masks[0, 1:4, 1:4] = True + cm = _make_cm(masks, (8, 8)) + assert cm == masks + + +class TestEdgeCases: + """Tests for boundary conditions and unusual inputs. + + Covers: zero-area bounding box (x1 == x2), masks that reach the image + edge, xyxy values beyond image dimensions (clamped silently), empty + CompactMask (N=0), sum axis compatibility with area, and with_offset for + use by InferenceSlicer. + """ + + def test_zero_area_mask_clipped_to_1x1(self) -> None: + """An invalid bounding box should not crash from_dense.""" + masks = np.zeros((1, 10, 10), dtype=bool) + xyxy = np.array([[6, 5, 5, 8]], dtype=np.float32) + with DoesNotRaise(): + cm = CompactMask.from_dense(masks, xyxy, image_shape=(10, 10)) + assert len(cm) == 1 + + def test_mask_at_image_boundary(self) -> None: + img_h, img_w = 20, 20 + masks = np.zeros((1, img_h, img_w), dtype=bool) + masks[0, 15:20, 15:20] = True + xyxy = np.array([[15, 15, 19, 19]], dtype=np.float32) + cm = CompactMask.from_dense(masks, xyxy, image_shape=(img_h, img_w)) + np.testing.assert_array_equal(cm.to_dense(), masks) + + def test_xyxy_beyond_image_clipped(self) -> None: + """xyxy values beyond the image boundary should be clipped silently.""" + img_h, img_w = 10, 10 + masks = np.zeros((1, img_h, img_w), dtype=bool) + masks[0, 5:10, 5:10] = True + xyxy = np.array([[5, 5, 999, 999]], dtype=np.float32) + with DoesNotRaise(): + cm = CompactMask.from_dense(masks, xyxy, image_shape=(img_h, img_w)) + np.testing.assert_array_equal(cm.to_dense(), masks) + + def test_empty_compact_mask_to_dense(self) -> None: + cm = CompactMask( + [], + np.empty((0, 2), dtype=np.int32), + np.empty((0, 2), dtype=np.int32), + (50, 60), + ) + dense = cm.to_dense() + assert dense.shape == (0, 50, 60) + assert dense.dtype == bool + + def test_sum_axis_1_2_equals_area(self) -> None: + rng = np.random.default_rng(11) + masks = rng.integers(0, 2, size=(4, 15, 15)).astype(bool) + cm = _make_cm(masks, (15, 15)) + np.testing.assert_array_equal(cm.sum(axis=(1, 2)), cm.area) + + def test_with_offset(self) -> None: + img_h, img_w = 20, 20 + masks = np.zeros((1, img_h, img_w), dtype=bool) + masks[0, 5:10, 5:10] = True + xyxy = np.array([[5, 5, 9, 9]], dtype=np.float32) + cm = CompactMask.from_dense(masks, xyxy, image_shape=(img_h, img_w)) + + cm2 = cm.with_offset(100, 200, new_image_shape=(400, 400)) + assert cm2.offsets[0].tolist() == [105, 205] + assert cm2._image_shape == (400, 400) + np.testing.assert_array_equal(cm2.crop(0), cm.crop(0)) + + def test_with_offset_clips_partial_overlap_like_move_masks(self) -> None: + """with_offset must clip partial out-of-frame translations like move_masks.""" + img_h, img_w = 10, 10 + masks = np.zeros((1, img_h, img_w), dtype=bool) + masks[0, 2:6, 3:8] = True + xyxy = np.array([[3, 2, 7, 5]], dtype=np.float32) + cm = CompactMask.from_dense(masks, xyxy, image_shape=(img_h, img_w)) + + dx, dy = -4, 3 + cm_shifted = cm.with_offset(dx=dx, dy=dy, new_image_shape=(img_h, img_w)) + expected = move_masks( + masks=masks, + offset=np.array([dx, dy], dtype=np.int32), + resolution_wh=(img_w, img_h), + ) + + np.testing.assert_array_equal(cm_shifted.to_dense(), expected) + + def test_with_offset_clips_full_outside_like_move_masks(self) -> None: + """Masks shifted fully outside should remain valid and decode to all-False.""" + img_h, img_w = 10, 10 + masks = np.zeros((1, img_h, img_w), dtype=bool) + masks[0, 2:6, 2:6] = True + xyxy = np.array([[2, 2, 5, 5]], dtype=np.float32) + cm = CompactMask.from_dense(masks, xyxy, image_shape=(img_h, img_w)) + + dx, dy = 100, 100 + cm_shifted = cm.with_offset(dx=dx, dy=dy, new_image_shape=(img_h, img_w)) + expected = move_masks( + masks=masks, + offset=np.array([dx, dy], dtype=np.int32), + resolution_wh=(img_w, img_h), + ) + + np.testing.assert_array_equal(cm_shifted.to_dense(), expected) + + def test_repack_tightens_loose_bbox(self) -> None: + """repack() shrinks the crop to the minimal True-pixel rectangle.""" + img_h, img_w = 20, 20 + masks = np.zeros((1, img_h, img_w), dtype=bool) + masks[0, 5:10, 6:12] = True # True block at (5,6)-(9,11) + + # Deliberately loose bbox covers full image. + xyxy = np.array([[0, 0, img_w - 1, img_h - 1]], dtype=np.float32) + cm = CompactMask.from_dense(masks, xyxy, image_shape=(img_h, img_w)) + + # Before repack: crop is the full 20x20 image. + assert cm._crop_shapes[0].tolist() == [20, 20] + + repacked = cm.repack() + + # After repack: crop is exactly the True block. + assert repacked.offsets[0].tolist() == [6, 5] # (x1, y1) + assert repacked._crop_shapes[0].tolist() == [5, 6] # (h, w) + # Pixel content must be identical to the original. + np.testing.assert_array_equal(repacked.to_dense(), masks) + + def test_repack_preserves_all_false_mask(self) -> None: + """repack() normalises an all-False mask to a 1x1 crop.""" + img_h, img_w = 10, 10 + masks = np.zeros((2, img_h, img_w), dtype=bool) + masks[1, 3:6, 3:6] = True # only mask 1 is non-empty + + xyxy = np.array([[0, 0, 9, 9], [0, 0, 9, 9]], dtype=np.float32) + cm = CompactMask.from_dense(masks, xyxy, image_shape=(img_h, img_w)) + repacked = cm.repack() + + assert repacked._crop_shapes[0].tolist() == [1, 1] # normalised + assert repacked._crop_shapes[1].tolist() == [3, 3] # tight True block + np.testing.assert_array_equal(repacked.to_dense(), masks) + + def test_repack_empty_collection(self) -> None: + """repack() on an empty CompactMask returns another empty CompactMask.""" + cm = CompactMask( + [], + np.empty((0, 2), dtype=np.int32), + np.empty((0, 2), dtype=np.int32), + (10, 10), + ) + repacked = cm.repack() + assert len(repacked) == 0 + assert repacked._image_shape == (10, 10) + + def test_repack_already_tight(self) -> None: + """repack() is a no-op when bboxes are already tight.""" + img_h, img_w = 15, 15 + masks = np.zeros((1, img_h, img_w), dtype=bool) + masks[0, 4:9, 3:8] = True + + # Tight bbox. + xyxy = np.array([[3, 4, 7, 8]], dtype=np.float32) + cm = CompactMask.from_dense(masks, xyxy, image_shape=(img_h, img_w)) + repacked = cm.repack() + + np.testing.assert_array_equal(repacked.offsets, cm.offsets) + np.testing.assert_array_equal(repacked._crop_shapes, cm._crop_shapes) + np.testing.assert_array_equal(repacked.to_dense(), masks) + + +class TestCalculateMasksCentroidsCompact: + """Verify calculate_masks_centroids gives identical results for CompactMask. + + The function has a dedicated CompactMask branch that computes centroids + per-crop. Results must match the dense path to within integer rounding. + """ + + def test_centroids_compact_matches_dense(self) -> None: + """Centroid coordinates must be numerically identical for dense and compact.""" + rng = np.random.default_rng(42) + img_h, img_w = 30, 30 + masks = rng.integers(0, 2, size=(5, img_h, img_w)).astype(bool) + # Ensure each mask has at least one True pixel. + for mask_idx in range(5): + masks[mask_idx, mask_idx * 5, mask_idx * 5] = True + + cm = _make_cm(masks, (img_h, img_w)) + + centroids_dense = calculate_masks_centroids(masks) + centroids_compact = calculate_masks_centroids(cm) + + np.testing.assert_array_equal(centroids_compact, centroids_dense) + + def test_centroids_empty_mask(self) -> None: + """All-zero masks should return centroid (0, 0) — same as dense.""" + img_h, img_w = 10, 10 + masks = np.zeros((3, img_h, img_w), dtype=bool) + cm = _make_cm(masks, (img_h, img_w)) + + centroids_dense = calculate_masks_centroids(masks) + centroids_compact = calculate_masks_centroids(cm) + + np.testing.assert_array_equal(centroids_compact, centroids_dense) + + def test_centroids_empty_mask_with_tight_bbox(self) -> None: + """All-zero tight crops must still return centroid (0, 0).""" + img_h, img_w = 10, 10 + masks = np.zeros((1, img_h, img_w), dtype=bool) + xyxy = np.array([[3, 4, 7, 8]], dtype=np.float32) + cm = CompactMask.from_dense(masks, xyxy, image_shape=(img_h, img_w)) + + centroids_dense = calculate_masks_centroids(masks) + centroids_compact = calculate_masks_centroids(cm) + + np.testing.assert_array_equal(centroids_compact, centroids_dense) + + def test_centroids_zero_masks_returns_empty(self) -> None: + """Empty CompactMask (0 objects) must return shape (0, 2).""" + empty_cm = CompactMask( + [], + np.empty((0, 2), dtype=np.int32), + np.empty((0, 2), dtype=np.int32), + (10, 10), + ) + result = calculate_masks_centroids(empty_cm) + assert result.shape == (0, 2) + + +class TestContainsHolesCompact: + """Verify contains_holes result is unchanged after CompactMask roundtrip. + + contains_holes works on a 2D boolean mask. Encoding then decoding via + CompactMask must preserve pixel topology so that the function returns + the same result as on the original array. + """ + + @pytest.mark.parametrize( + ("mask_2d", "expected"), + [ + # simple foreground blob — no holes + ( + np.array( + [[0, 1, 1, 0], [1, 1, 1, 1], [1, 1, 1, 1], [0, 1, 1, 0]], + dtype=bool, + ), + False, + ), + # ring shape — has one hole + ( + np.array( + [[1, 1, 1, 0], [1, 0, 1, 0], [1, 1, 1, 0], [0, 0, 0, 0]], + dtype=bool, + ), + True, + ), + # all-False — no holes + (np.zeros((6, 6), dtype=bool), False), + # all-True — no holes + (np.ones((6, 6), dtype=bool), False), + ], + ) + def test_contains_holes_compact_roundtrip( + self, mask_2d: np.ndarray, expected: bool + ) -> None: + """contains_holes must agree after CompactMask encode→decode.""" + img_h, img_w = mask_2d.shape + masks = mask_2d[np.newaxis] # (1, H, W) + cm = _make_cm(masks, (img_h, img_w)) + + decoded = cm.to_dense()[0] + assert contains_holes(decoded) == expected + assert contains_holes(decoded) == contains_holes(mask_2d) + + +class TestContainsMultipleSegmentsCompact: + """Verify contains_multiple_segments result survives CompactMask roundtrip. + + Encoding and decoding must preserve connected-component topology so + that the multi-segment predicate returns the same value. + """ + + @pytest.mark.parametrize( + ("mask_2d", "connectivity", "expected"), + [ + # single contiguous blob — not multi-segment + ( + np.array( + [[0, 1, 1, 0], [1, 1, 1, 1], [1, 1, 1, 1], [0, 1, 1, 0]], + dtype=bool, + ), + 4, + False, + ), + # two separate blobs — multi-segment + ( + np.array( + [[1, 1, 0, 0], [1, 1, 0, 0], [0, 0, 1, 1], [0, 0, 1, 1]], + dtype=bool, + ), + 4, + True, + ), + # diagonal touch — single segment under 8-connectivity + ( + np.array( + [[1, 1, 0, 0], [1, 1, 0, 1], [1, 0, 1, 1], [0, 0, 1, 1]], + dtype=bool, + ), + 8, + False, + ), + # all-False — not multi-segment + (np.zeros((6, 6), dtype=bool), 4, False), + ], + ) + def test_contains_multiple_segments_compact_roundtrip( + self, mask_2d: np.ndarray, connectivity: int, expected: bool + ) -> None: + """contains_multiple_segments must agree after CompactMask encode→decode.""" + img_h, img_w = mask_2d.shape + masks = mask_2d[np.newaxis] # (1, H, W) + cm = _make_cm(masks, (img_h, img_w)) + + decoded = cm.to_dense()[0] + result = contains_multiple_segments(decoded, connectivity=connectivity) + assert result == expected + assert result == contains_multiple_segments(mask_2d, connectivity=connectivity) + + +# --------------------------------------------------------------------------- +# Random scenario helpers +# --------------------------------------------------------------------------- + +# Varying (N, image_h, image_w) combinations for random tests. +_RANDOM_CONFIGS = [ + (1, 50, 50), + (5, 50, 50), + (5, 200, 300), + (20, 100, 150), + (20, 200, 300), + (50, 50, 50), + (5, 1080, 1920), + (1, 1080, 1920), + (20, 480, 640), + (50, 100, 100), +] + + +def _random_masks_and_xyxy( + rng: np.random.Generator, + num_masks: int, + img_h: int, + img_w: int, + fill_prob: float = 0.3, +) -> tuple[np.ndarray, np.ndarray]: + """Generate *num_masks* random boolean masks with matching tight xyxy boxes. + + Each mask is built by filling a random sub-rectangle with Bernoulli noise at + ``fill_prob``, then computing tight bounding boxes via ``mask_to_xyxy``. + This guarantees every mask has at least one True pixel (for non-degenerate + bounding boxes). + """ + masks = np.zeros((num_masks, img_h, img_w), dtype=bool) + for mask_idx in range(num_masks): + y1 = rng.integers(0, img_h) + y2 = rng.integers(y1, img_h) + x1 = rng.integers(0, img_w) + x2 = rng.integers(x1, img_w) + region = rng.random((y2 - y1 + 1, x2 - x1 + 1)) < fill_prob + # Ensure at least one True pixel. + if not region.any(): + region[0, 0] = True + masks[mask_idx, y1 : y2 + 1, x1 : x2 + 1] = region + + xyxy = mask_to_xyxy(masks).astype(np.float32) + return masks, xyxy + + +class TestCompactMaskRoundtripRandom: + """from_dense -> to_dense pixel equality across 10 random seeds. + + Uses tight bounding boxes so the round-trip must be lossless (all True + pixels lie strictly within the crop). + """ + + @pytest.mark.parametrize("seed", list(range(10))) + def test_parity_seed(self, seed: int) -> None: + rng = np.random.default_rng(seed) + num_masks, img_h, img_w = _RANDOM_CONFIGS[seed] + masks, xyxy = _random_masks_and_xyxy(rng, num_masks, img_h, img_w) + cm = CompactMask.from_dense(masks, xyxy, image_shape=(img_h, img_w)) + np.testing.assert_array_equal( + cm.to_dense(), + masks, + err_msg=( + f"Round-trip failed for seed={seed}, " + f"N={num_masks}, shape=({img_h},{img_w})" + ), + ) + + @pytest.mark.parametrize("seed", list(range(10))) + def test_shape_and_len(self, seed: int) -> None: + """len() and .shape must agree with the dense array.""" + rng = np.random.default_rng(seed) + num_masks, img_h, img_w = _RANDOM_CONFIGS[seed] + masks, xyxy = _random_masks_and_xyxy(rng, num_masks, img_h, img_w) + cm = CompactMask.from_dense(masks, xyxy, image_shape=(img_h, img_w)) + assert len(cm) == num_masks + assert cm.shape == (num_masks, img_h, img_w) + + @pytest.mark.parametrize("seed", list(range(10))) + def test_individual_mask_access(self, seed: int) -> None: + """cm[i] must equal masks[i] for every index.""" + rng = np.random.default_rng(seed) + num_masks, img_h, img_w = _RANDOM_CONFIGS[seed] + masks, xyxy = _random_masks_and_xyxy(rng, num_masks, img_h, img_w) + cm = CompactMask.from_dense(masks, xyxy, image_shape=(img_h, img_w)) + for mask_idx in range(num_masks): + np.testing.assert_array_equal( + cm[mask_idx], + masks[mask_idx], + err_msg=f"cm[{mask_idx}] mismatch for seed={seed}", + ) + + +class TestCompactMaskAreaRandom: + """area from CompactMask equals dense .sum(axis=(1,2)) across 10 seeds.""" + + @pytest.mark.parametrize("seed", list(range(10))) + def test_parity_seed(self, seed: int) -> None: + rng = np.random.default_rng(seed) + num_masks, img_h, img_w = _RANDOM_CONFIGS[seed] + masks, xyxy = _random_masks_and_xyxy(rng, num_masks, img_h, img_w) + cm = CompactMask.from_dense(masks, xyxy, image_shape=(img_h, img_w)) + + expected_area = masks.sum(axis=(1, 2)) + np.testing.assert_array_equal( + cm.area, + expected_area, + err_msg=( + f"Area mismatch for seed={seed}, N={num_masks}, shape=({img_h},{img_w})" + ), + ) + + @pytest.mark.parametrize("seed", list(range(10))) + def test_sum_axis_matches_area(self, seed: int) -> None: + """cm.sum(axis=(1,2)) must equal cm.area (the fast path).""" + rng = np.random.default_rng(seed) + num_masks, img_h, img_w = _RANDOM_CONFIGS[seed] + masks, xyxy = _random_masks_and_xyxy(rng, num_masks, img_h, img_w) + cm = CompactMask.from_dense(masks, xyxy, image_shape=(img_h, img_w)) + np.testing.assert_array_equal(cm.sum(axis=(1, 2)), cm.area) + + +class TestCompactMaskFilterRandom: + """Boolean filter on CompactMask matches dense fancy indexing across 10 seeds.""" + + @pytest.mark.parametrize("seed", list(range(10))) + def test_parity_seed(self, seed: int) -> None: + rng = np.random.default_rng(seed) + num_masks, img_h, img_w = _RANDOM_CONFIGS[seed] + masks, xyxy = _random_masks_and_xyxy(rng, num_masks, img_h, img_w) + cm = CompactMask.from_dense(masks, xyxy, image_shape=(img_h, img_w)) + + selector = rng.random(num_masks) > 0.5 + # Guarantee at least one True in the selector so we test non-empty subsets. + if not selector.any(): + selector[0] = True + + subset_cm = cm[selector] + subset_dense = masks[selector] + + assert isinstance(subset_cm, CompactMask) + assert len(subset_cm) == int(selector.sum()) + np.testing.assert_array_equal( + subset_cm.to_dense(), + subset_dense, + err_msg=f"Boolean filter mismatch for seed={seed}", + ) + + @pytest.mark.parametrize("seed", list(range(10))) + def test_list_index(self, seed: int) -> None: + """Integer list indexing must match dense fancy indexing.""" + rng = np.random.default_rng(seed) + num_masks, img_h, img_w = _RANDOM_CONFIGS[seed] + masks, xyxy = _random_masks_and_xyxy(rng, num_masks, img_h, img_w) + cm = CompactMask.from_dense(masks, xyxy, image_shape=(img_h, img_w)) + + num_selected = min(num_masks, max(1, rng.integers(1, num_masks + 1))) + indices = sorted( + rng.choice(num_masks, size=num_selected, replace=False).tolist() + ) + + subset_cm = cm[indices] + subset_dense = masks[indices] + np.testing.assert_array_equal( + subset_cm.to_dense(), + subset_dense, + err_msg=f"List index mismatch for seed={seed}, indices={indices}", + ) + + +class TestCompactMaskWithOffsetRandom: + """with_offset roundtrip matches move_masks across 10 random seeds.""" + + @pytest.mark.parametrize("seed", list(range(10))) + def test_parity_seed(self, seed: int) -> None: + rng = np.random.default_rng(seed) + # Use smaller images to keep move_masks fast. + num_masks = rng.integers(1, 10) + img_h, img_w = int(rng.integers(30, 80)), int(rng.integers(30, 80)) + masks, xyxy = _random_masks_and_xyxy(rng, num_masks, img_h, img_w) + cm = CompactMask.from_dense(masks, xyxy, image_shape=(img_h, img_w)) + + # Random offset that may push some masks partially or fully off-frame. + dx = int(rng.integers(-img_w, img_w)) + dy = int(rng.integers(-img_h, img_h)) + + cm_shifted = cm.with_offset(dx=dx, dy=dy, new_image_shape=(img_h, img_w)) + expected = move_masks( + masks=masks, + offset=np.array([dx, dy], dtype=np.int32), + resolution_wh=(img_w, img_h), + ) + + np.testing.assert_array_equal( + cm_shifted.to_dense(), + expected, + err_msg=( + f"with_offset mismatch for seed={seed}, " + f"dx={dx}, dy={dy}, shape=({img_h},{img_w})" + ), + ) + + @pytest.mark.parametrize("seed", list(range(10))) + def test_offset_into_larger_canvas(self, seed: int) -> None: + """Offset into a larger destination image must preserve pixels.""" + rng = np.random.default_rng(seed + 100) + num_masks = rng.integers(1, 8) + img_h, img_w = int(rng.integers(20, 50)), int(rng.integers(20, 50)) + masks, xyxy = _random_masks_and_xyxy(rng, num_masks, img_h, img_w) + cm = CompactMask.from_dense(masks, xyxy, image_shape=(img_h, img_w)) + + new_h, new_w = img_h * 2, img_w * 2 + dx = int(rng.integers(0, img_w)) + dy = int(rng.integers(0, img_h)) + + cm_shifted = cm.with_offset(dx=dx, dy=dy, new_image_shape=(new_h, new_w)) + dense_shifted = cm_shifted.to_dense() + + assert dense_shifted.shape == (num_masks, new_h, new_w) + # Manually place each original mask into the larger canvas. + expected = np.zeros((num_masks, new_h, new_w), dtype=bool) + for mask_idx in range(num_masks): + expected[mask_idx, dy : dy + img_h, dx : dx + img_w] |= masks[mask_idx] + + np.testing.assert_array_equal( + dense_shifted, + expected, + err_msg=f"Larger canvas offset mismatch for seed={seed}", + ) diff --git a/tests/detection/test_compact_mask_integration.py b/tests/detection/test_compact_mask_integration.py new file mode 100644 index 0000000000..210ec8182c --- /dev/null +++ b/tests/detection/test_compact_mask_integration.py @@ -0,0 +1,274 @@ +"""Integration tests: CompactMask <-> Detections, annotators, merge.""" + +from __future__ import annotations + +from contextlib import ExitStack as DoesNotRaise + +import numpy as np +import pytest + +import supervision as sv +from supervision.detection.compact_mask import CompactMask +from supervision.detection.core import Detections + + +def _full_xyxy(n: int, h: int, w: int) -> np.ndarray: + """N boxes covering the whole image (ensures crop == full mask).""" + return np.tile(np.array([0, 0, w, h], dtype=np.float32), (n, 1)) + + +def _make_compact_detections( + n: int, h: int = 40, w: int = 40 +) -> tuple[Detections, np.ndarray]: + """Detections with a CompactMask backed by full-image bounding boxes. + + Using full-image xyxy means all True pixels are within the crop region, + so from_dense -> to_dense is lossless. + """ + rng = np.random.default_rng(42) + masks = rng.integers(0, 2, size=(n, h, w)).astype(bool) + xyxy = _full_xyxy(n, h, w) + cm = CompactMask.from_dense(masks, xyxy, image_shape=(h, w)) + det = Detections( + xyxy=xyxy, + mask=cm, + confidence=np.ones(n, dtype=np.float32) * 0.9, + class_id=np.arange(n), + ) + return det, masks + + +class TestConstruction: + """Tests for building Detections with a CompactMask. + + Verifies that a CompactMask is accepted as a valid mask argument and that + the validator raises ValueError when the mask length does not match the + number of bounding boxes. + """ + + def test_detections_construction_with_compact_mask(self) -> None: + with DoesNotRaise(): + det, _ = _make_compact_detections(3) + assert isinstance(det.mask, CompactMask) + assert len(det) == 3 + + def test_detections_compact_mask_validation_mismatch(self) -> None: + n, h, w = 3, 20, 20 + xyxy = _full_xyxy(n, h, w) + masks_wrong_n = np.zeros((n + 1, h, w), dtype=bool) + cm = CompactMask.from_dense(masks_wrong_n, _full_xyxy(n + 1, h, w), (h, w)) + with pytest.raises(ValueError, match="mask must contain"): + Detections(xyxy=xyxy, mask=cm) + + +class TestFiltering: + """Tests for Detections.__getitem__ with a CompactMask. + + Verifies that integer, slice, and boolean-array indexing all preserve the + CompactMask type and return the correct subset of masks. + """ + + def test_int_wraps_to_compact_mask(self) -> None: + det, _ = _make_compact_detections(3) + # Detections converts int to [int] internally -> subset has 1 element + subset = det[1] + assert isinstance(subset.mask, CompactMask) + assert len(subset) == 1 + + def test_slice_preserves_compact_mask(self) -> None: + det, masks = _make_compact_detections(4) + subset = det[1:3] + assert isinstance(subset.mask, CompactMask) + assert len(subset) == 2 + np.testing.assert_array_equal(subset.mask.to_dense(), masks[1:3]) + + def test_bool_array_preserves_compact_mask(self) -> None: + det, masks = _make_compact_detections(4) + selector = np.array([True, False, True, False]) + subset = det[selector] + assert isinstance(subset.mask, CompactMask) + assert len(subset) == 2 + np.testing.assert_array_equal(subset.mask.to_dense(), masks[[0, 2]]) + + +class TestIteration: + """Tests for iterating over Detections with a CompactMask. + + Verifies that each iteration step yields a 2-D boolean (H, W) array + identical to the corresponding dense mask, so downstream code that + iterates over detections needs no changes. + """ + + def test_iter_yields_2d_dense(self) -> None: + h, w = 20, 20 + det, masks = _make_compact_detections(3, h, w) + for i, (_, mask_2d, *_) in enumerate(det): + assert mask_2d is not None + assert isinstance(mask_2d, np.ndarray) + assert mask_2d.shape == (h, w) + assert mask_2d.dtype == bool + np.testing.assert_array_equal(mask_2d, masks[i]) + + +class TestEquality: + """Tests for Detections.__eq__ mixing CompactMask and dense arrays. + + Verifies that a Detections object backed by a CompactMask compares equal + to an otherwise identical Detections object backed by a dense ndarray. + """ + + def test_compact_vs_dense(self) -> None: + h, w = 20, 20 + det_compact, masks = _make_compact_detections(2, h, w) + xyxy = det_compact.xyxy.copy() + det_dense = Detections( + xyxy=xyxy, + mask=masks, + confidence=np.ones(2, dtype=np.float32) * 0.9, + class_id=np.arange(2), + ) + assert det_compact == det_dense + + +class TestArea: + """Tests for the Detections.area property with a CompactMask. + + Verifies that the fast CompactMask path in Detections.area returns the + same per-detection pixel counts as summing the equivalent dense array. + """ + + def test_compact_matches_dense(self) -> None: + det_compact, masks = _make_compact_detections(3) + expected_area = np.array([m.sum() for m in masks]) + np.testing.assert_array_equal(det_compact.area, expected_area) + + +class TestMerge: + """Tests for merging Detections objects that contain CompactMask instances. + + Covers three scenarios: + - All-compact merge: result is a CompactMask. + - Mixed compact + dense: result falls back to a dense ndarray. + - Inner pair merge (merge_inner_detection_object_pair): used during NMS-like + operations, each input must contain exactly one detection. + """ + + def test_all_compact(self) -> None: + h, w = 30, 30 + det1, masks1 = _make_compact_detections(2, h, w) + + rng = np.random.default_rng(7) + masks2 = rng.integers(0, 2, size=(3, h, w)).astype(bool) + xyxy2 = _full_xyxy(3, h, w) + cm2 = CompactMask.from_dense(masks2, xyxy2, (h, w)) + det2 = Detections( + xyxy=xyxy2, + mask=cm2, + confidence=np.ones(3, dtype=np.float32) * 0.8, + class_id=np.arange(3), + ) + + merged = Detections.merge([det1, det2]) + assert isinstance(merged.mask, CompactMask) + assert len(merged) == 5 + expected = np.concatenate([masks1, masks2], axis=0) + np.testing.assert_array_equal(merged.mask.to_dense(), expected) + + def test_mixed_compact_and_dense(self) -> None: + """Merging a CompactMask with a dense ndarray falls back to dense.""" + h, w = 20, 20 + det_compact, _ = _make_compact_detections(2, h, w) + masks_dense = np.zeros((1, h, w), dtype=bool) + xyxy_dense = _full_xyxy(1, h, w) + det_dense = Detections( + xyxy=xyxy_dense, + mask=masks_dense, + confidence=np.array([0.5], dtype=np.float32), + class_id=np.array([0]), + ) + + merged = Detections.merge([det_compact, det_dense]) + assert isinstance(merged.mask, np.ndarray) + assert merged.mask.shape == (3, h, w) + + def test_inner_pair_with_compact(self) -> None: + from supervision.detection.core import merge_inner_detection_object_pair + + h, w = 20, 20 + masks_a = np.zeros((1, h, w), dtype=bool) + masks_a[0, 0:5, 0:5] = True + xyxy_a = _full_xyxy(1, h, w) + cm_a = CompactMask.from_dense(masks_a, xyxy_a, (h, w)) + det_a = Detections( + xyxy=xyxy_a, + mask=cm_a, + confidence=np.array([0.9], dtype=np.float32), + class_id=np.array([1]), + ) + + masks_b = np.zeros((1, h, w), dtype=bool) + masks_b[0, 5:10, 5:10] = True + xyxy_b = _full_xyxy(1, h, w) + cm_b = CompactMask.from_dense(masks_b, xyxy_b, (h, w)) + det_b = Detections( + xyxy=xyxy_b, + mask=cm_b, + confidence=np.array([0.7], dtype=np.float32), + class_id=np.array([1]), + ) + + with DoesNotRaise(): + result = merge_inner_detection_object_pair(det_a, det_b) + assert len(result) == 1 + + +class TestAnnotators: + """Tests for annotators that consume CompactMask via Detections. + + Verifies that MaskAnnotator and PolygonAnnotator produce pixel-identical + output when given Detections backed by a CompactMask versus the equivalent + dense ndarray, confirming that the annotators are transparent to the mask + representation. + """ + + def test_mask_annotator(self) -> None: + h, w = 40, 40 + det_compact, masks = _make_compact_detections(2, h, w) + det_dense = Detections( + xyxy=det_compact.xyxy.copy(), + mask=masks, + confidence=det_compact.confidence.copy(), + class_id=det_compact.class_id.copy(), + ) + + image = np.zeros((h, w, 3), dtype=np.uint8) + annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX) + + annotated_compact = annotator.annotate(image.copy(), det_compact) + annotated_dense = annotator.annotate(image.copy(), det_dense) + + np.testing.assert_array_equal( + annotated_compact, + annotated_dense, + err_msg="MaskAnnotator output differs between CompactMask and dense mask", + ) + + def test_polygon_annotator(self) -> None: + h, w = 40, 40 + # Use solid rectangular masks for stable polygon results. + masks = np.zeros((2, h, w), dtype=bool) + masks[0, 5:15, 5:15] = True + masks[1, 20:30, 20:30] = True + xyxy = _full_xyxy(2, h, w) + cm = CompactMask.from_dense(masks, xyxy, (h, w)) + + det_compact = Detections(xyxy=xyxy, mask=cm, class_id=np.array([0, 1])) + det_dense = Detections(xyxy=xyxy, mask=masks, class_id=np.array([0, 1])) + + image = np.zeros((h, w, 3), dtype=np.uint8) + annotator = sv.PolygonAnnotator(color_lookup=sv.ColorLookup.INDEX) + + annotated_compact = annotator.annotate(image.copy(), det_compact) + annotated_dense = annotator.annotate(image.copy(), det_dense) + + np.testing.assert_array_equal(annotated_compact, annotated_dense) diff --git a/tests/detection/test_compact_mask_iou.py b/tests/detection/test_compact_mask_iou.py new file mode 100644 index 0000000000..dc4aed7ee9 --- /dev/null +++ b/tests/detection/test_compact_mask_iou.py @@ -0,0 +1,500 @@ +"""Correctness and integration tests for CompactMask IoU and NMS. + +These tests verify that: +- compact_mask_iou_batch gives numerically identical results to the + dense mask_iou_batch (raster IoU) for all overlap patterns. +- mask_iou_batch dispatches correctly when given CompactMask inputs. +- mask_non_max_suppression and mask_non_max_merge work with CompactMask + and produce the same keep-set as when given equivalent dense arrays. +""" + +from __future__ import annotations + +import numpy as np +import pytest + +from supervision.detection.compact_mask import CompactMask +from supervision.detection.utils.iou_and_nms import ( + OverlapMetric, + compact_mask_iou_batch, + mask_iou_batch, + mask_non_max_merge, + mask_non_max_suppression, +) + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + + +def _cm_from_masks(masks: np.ndarray, image_shape: tuple[int, int]) -> CompactMask: + """Build a CompactMask using full-image bounding boxes (lossless).""" + num_masks = len(masks) + img_h, img_w = image_shape + xyxy = np.tile( + np.array([0, 0, img_w - 1, img_h - 1], dtype=np.float32), (num_masks, 1) + ) + return CompactMask.from_dense(masks, xyxy, image_shape=image_shape) + + +def _cm_tight(masks: np.ndarray, image_shape: tuple[int, int]) -> CompactMask: + """Build a CompactMask using tight per-mask bounding boxes.""" + from supervision.detection.utils.converters import mask_to_xyxy + + xyxy = mask_to_xyxy(masks).astype(np.float32) + return CompactMask.from_dense(masks, xyxy, image_shape=image_shape) + + +def _dense_iou( + masks_a: np.ndarray, + masks_b: np.ndarray, + metric: OverlapMetric = OverlapMetric.IOU, +) -> np.ndarray: + """Reference pairwise IoU using the existing dense implementation.""" + return mask_iou_batch(masks_a, masks_b, overlap_metric=metric) + + +class TestCompactMaskIouBatch: + """Verify that compact_mask_iou_batch matches dense raster IoU exactly. + + Every test builds a pair of CompactMask collections from known boolean + arrays, runs compact_mask_iou_batch, and compares the result to the dense + reference computed by mask_iou_batch on the raw numpy arrays. + """ + + def test_no_overlap_gives_zero(self) -> None: + """Non-overlapping masks should always produce IoU = 0.""" + img_h, img_w = 20, 20 + masks_a = np.zeros((1, img_h, img_w), dtype=bool) + masks_a[0, 0:5, 0:5] = True # top-left + + masks_b = np.zeros((1, img_h, img_w), dtype=bool) + masks_b[0, 10:15, 10:15] = True # bottom-right + + cm_a = _cm_from_masks(masks_a, (img_h, img_w)) + cm_b = _cm_from_masks(masks_b, (img_h, img_w)) + + result = compact_mask_iou_batch(cm_a, cm_b) + assert result.shape == (1, 1) + assert result[0, 0] == pytest.approx(0.0) + + def test_identical_masks_give_one(self) -> None: + """IoU of a mask with itself must be 1.0.""" + img_h, img_w = 20, 20 + masks = np.zeros((2, img_h, img_w), dtype=bool) + masks[0, 2:8, 2:8] = True + masks[1, 10:18, 10:18] = True + + cm = _cm_from_masks(masks, (img_h, img_w)) + result = compact_mask_iou_batch(cm, cm) + + assert result.shape == (2, 2) + np.testing.assert_allclose(np.diag(result), [1.0, 1.0], atol=1e-9) + + def test_matches_dense_random(self) -> None: + """compact_mask_iou_batch must be numerically identical to dense IoU.""" + rng = np.random.default_rng(0) + img_h, img_w = 30, 30 + masks_a = rng.integers(0, 2, size=(5, img_h, img_w)).astype(bool) + masks_b = rng.integers(0, 2, size=(4, img_h, img_w)).astype(bool) + + cm_a = _cm_from_masks(masks_a, (img_h, img_w)) + cm_b = _cm_from_masks(masks_b, (img_h, img_w)) + + compact_result = compact_mask_iou_batch(cm_a, cm_b) + dense_result = _dense_iou(masks_a, masks_b) + + assert compact_result.shape == (5, 4) + np.testing.assert_allclose(compact_result, dense_result, atol=1e-9) + + def test_matches_dense_with_tight_bboxes(self) -> None: + """Using tight bounding boxes (mask_to_xyxy) must still be accurate.""" + rng = np.random.default_rng(1) + img_h, img_w = 40, 40 + masks_a = rng.integers(0, 2, size=(4, img_h, img_w)).astype(bool) + masks_b = rng.integers(0, 2, size=(3, img_h, img_w)).astype(bool) + + cm_a = _cm_tight(masks_a, (img_h, img_w)) + cm_b = _cm_tight(masks_b, (img_h, img_w)) + + compact_result = compact_mask_iou_batch(cm_a, cm_b) + dense_result = _dense_iou(masks_a, masks_b) + + np.testing.assert_allclose(compact_result, dense_result, atol=1e-9) + + def test_partial_overlap(self) -> None: + """Partially overlapping masks: IoU should match the analytic value.""" + img_h, img_w = 10, 10 + # Mask A: columns 0-4 (5 wide), Mask B: columns 3-7 (5 wide). + # Overlap: columns 3-4 (2 wide) x full height (10 rows) = 20 px. + masks_a = np.zeros((1, img_h, img_w), dtype=bool) + masks_a[0, :, 0:5] = True # area = 50 + + masks_b = np.zeros((1, img_h, img_w), dtype=bool) + masks_b[0, :, 3:8] = True # area = 50 + + cm_a = _cm_from_masks(masks_a, (img_h, img_w)) + cm_b = _cm_from_masks(masks_b, (img_h, img_w)) + + result = compact_mask_iou_batch(cm_a, cm_b) + # inter=20, union=50+50-20=80 → IoU=0.25 + assert result[0, 0] == pytest.approx(0.25, abs=1e-9) + np.testing.assert_allclose(result, _dense_iou(masks_a, masks_b), atol=1e-9) + + def test_ios_metric(self) -> None: + """IOS = intersection / min(area_a, area_b) must match dense reference.""" + rng = np.random.default_rng(2) + img_h, img_w = 25, 25 + masks_a = rng.integers(0, 2, size=(3, img_h, img_w)).astype(bool) + masks_b = rng.integers(0, 2, size=(3, img_h, img_w)).astype(bool) + + cm_a = _cm_from_masks(masks_a, (img_h, img_w)) + cm_b = _cm_from_masks(masks_b, (img_h, img_w)) + + compact_result = compact_mask_iou_batch(cm_a, cm_b, OverlapMetric.IOS) + dense_result = _dense_iou(masks_a, masks_b, OverlapMetric.IOS) + + np.testing.assert_allclose(compact_result, dense_result, atol=1e-9) + + def test_all_false_masks(self) -> None: + """Zero-area masks should produce IoU = 0, not NaN.""" + img_h, img_w = 10, 10 + masks_a = np.zeros((2, img_h, img_w), dtype=bool) + masks_b = np.zeros((2, img_h, img_w), dtype=bool) + + cm_a = _cm_from_masks(masks_a, (img_h, img_w)) + cm_b = _cm_from_masks(masks_b, (img_h, img_w)) + + result = compact_mask_iou_batch(cm_a, cm_b) + assert not np.any(np.isnan(result)) + np.testing.assert_array_equal(result, 0.0) + + def test_empty_inputs(self) -> None: + """Empty CompactMask collections should return a zero-shaped matrix.""" + img_h, img_w = 10, 10 + empty = CompactMask( + [], + np.empty((0, 2), dtype=np.int32), + np.empty((0, 2), dtype=np.int32), + (img_h, img_w), + ) + masks = np.zeros((3, img_h, img_w), dtype=bool) + cm = _cm_from_masks(masks, (img_h, img_w)) + + result_a = compact_mask_iou_batch(empty, cm) + assert result_a.shape == (0, 3) + + result_b = compact_mask_iou_batch(cm, empty) + assert result_b.shape == (3, 0) + + def test_n_by_n_pairwise(self) -> None: + """N x N pairwise IoU: diagonal must be 1.0 for non-zero-area masks.""" + img_h, img_w = 50, 50 + rng = np.random.default_rng(3) + masks = rng.integers(0, 2, size=(8, img_h, img_w)).astype(bool) + # Ensure no all-false mask (diagonal would be undefined). + for mask_idx in range(8): + masks[mask_idx, mask_idx * 5, mask_idx * 5] = True + + cm = _cm_from_masks(masks, (img_h, img_w)) + result = compact_mask_iou_batch(cm, cm) + + assert result.shape == (8, 8) + np.testing.assert_allclose(np.diag(result), 1.0, atol=1e-9) + np.testing.assert_allclose(result, _dense_iou(masks, masks), atol=1e-9) + + +class TestMaskIouBatchDispatch: + """Verify mask_iou_batch dispatches correctly for CompactMask inputs. + + When both arguments are CompactMask, the function must route to the + efficient RLE implementation and produce identical results to the dense + path. When one argument is dense and the other is CompactMask, the + CompactMask must be materialised transparently before computation. + """ + + def test_both_compact_dispatches_to_rle(self) -> None: + img_h, img_w = 20, 20 + rng = np.random.default_rng(10) + masks_a = rng.integers(0, 2, size=(3, img_h, img_w)).astype(bool) + masks_b = rng.integers(0, 2, size=(2, img_h, img_w)).astype(bool) + + cm_a = _cm_from_masks(masks_a, (img_h, img_w)) + cm_b = _cm_from_masks(masks_b, (img_h, img_w)) + + result_compact = mask_iou_batch(cm_a, cm_b) + result_dense = mask_iou_batch(masks_a, masks_b) + + np.testing.assert_allclose(result_compact, result_dense, atol=1e-9) + + def test_mixed_compact_and_dense(self) -> None: + """One CompactMask + one dense array must still work correctly.""" + img_h, img_w = 20, 20 + rng = np.random.default_rng(11) + masks_a = rng.integers(0, 2, size=(3, img_h, img_w)).astype(bool) + masks_b = rng.integers(0, 2, size=(2, img_h, img_w)).astype(bool) + + cm_a = _cm_from_masks(masks_a, (img_h, img_w)) + + result = mask_iou_batch(cm_a, masks_b) + expected = mask_iou_batch(masks_a, masks_b) + np.testing.assert_allclose(result, expected, atol=1e-9) + + +class TestNmsWithCompactMask: + """Verify mask NMS produces identical keep-sets for CompactMask and dense inputs. + + Both paths now use exact full-resolution IoU — no resize approximation. + Tests use images larger than 640 px to ensure the old resize-to-640 path + would have introduced lossy approximation (catching the regression). + """ + + def test_nms_compact_matches_dense(self) -> None: + """NMS keep-set is identical for CompactMask and the equivalent dense array.""" + # Use > 640 px so the old resize-to-640 path would have been lossy. + img_h, img_w = 720, 720 + masks = np.zeros((3, img_h, img_w), dtype=bool) + masks[0, 0:360, 0:360] = True # top-left + masks[1, 0:324, 0:324] = True # heavily overlaps mask 0 + masks[2, 360:720, 360:720] = True # bottom-right, no overlap + + scores = np.array([0.9, 0.8, 0.7]) + predictions = np.column_stack( + [np.zeros((3, 4)), scores] # dummy xyxy, real scores + ) + + cm = _cm_from_masks(masks, (img_h, img_w)) + + keep_dense = mask_non_max_suppression(predictions, masks, iou_threshold=0.3) + keep_compact = mask_non_max_suppression(predictions, cm, iou_threshold=0.3) + + np.testing.assert_array_equal(keep_compact, keep_dense) + + def test_nms_compact_matches_dense_borderline(self) -> None: + """Borderline IoU pair (≈ threshold) must agree — catches the resize bug. + + With resize-to-640, sub-pixel rounding on a pair whose true IoU is very + close to the threshold flips the keep/suppress decision. Both paths now + compute exact pixel-level IoU so results are identical. + """ + img_h, img_w = 1080, 1920 + masks = np.zeros((2, img_h, img_w), dtype=bool) + # Mask 0: 200x200 square; mask 1: shifted 141 px → true IoU ≈ 0.50. + masks[0, 100:300, 100:300] = True + masks[1, 241:441, 241:441] = True + + scores = np.array([0.9, 0.8]) + predictions = np.column_stack([np.zeros((2, 4)), scores]) + cm = _cm_from_masks(masks, (img_h, img_w)) + + keep_dense = mask_non_max_suppression(predictions, masks, iou_threshold=0.5) + keep_compact = mask_non_max_suppression(predictions, cm, iou_threshold=0.5) + + np.testing.assert_array_equal(keep_compact, keep_dense) + + def test_nms_compact_no_suppression(self) -> None: + """Non-overlapping masks: all should be kept.""" + img_h, img_w = 20, 20 + masks = np.zeros((3, img_h, img_w), dtype=bool) + masks[0, 0:5, 0:5] = True + masks[1, 7:12, 7:12] = True + masks[2, 14:19, 14:19] = True + + scores = np.array([0.9, 0.8, 0.7]) + predictions = np.column_stack([np.zeros((3, 4)), scores]) + cm = _cm_from_masks(masks, (img_h, img_w)) + + keep = mask_non_max_suppression(predictions, cm, iou_threshold=0.5) + assert keep.all(), "All non-overlapping masks should be kept" + + def test_nms_compact_full_suppression(self) -> None: + """Identical masks: only the highest-confidence one should survive.""" + img_h, img_w = 20, 20 + mask = np.zeros((1, img_h, img_w), dtype=bool) + mask[0, 5:15, 5:15] = True + + masks = np.repeat(mask, 3, axis=0) + scores = np.array([0.9, 0.8, 0.7]) + predictions = np.column_stack([np.zeros((3, 4)), scores]) + cm = _cm_from_masks(masks, (img_h, img_w)) + + keep = mask_non_max_suppression(predictions, cm, iou_threshold=0.5) + assert keep.sum() == 1 + assert keep[0], "Highest-confidence mask should survive" + + +class TestNmmWithCompactMask: + """Verify mask_non_max_merge produces the same groups for CompactMask and dense. + + NMM materialises CompactMask to a downscaled dense array internally, so + results must be numerically identical to the dense path. + """ + + def test_nmm_compact_matches_dense(self) -> None: + """Merge groups must match between CompactMask and dense inputs.""" + img_h, img_w = 40, 40 + masks = np.zeros((3, img_h, img_w), dtype=bool) + masks[0, 0:20, 0:20] = True # top-left + masks[1, 0:18, 0:18] = True # heavily overlaps mask 0 + masks[2, 20:40, 20:40] = True # bottom-right, no overlap + + scores = np.array([0.9, 0.8, 0.7]) + predictions = np.column_stack([np.zeros((3, 4)), scores]) + cm = _cm_from_masks(masks, (img_h, img_w)) + + groups_dense = mask_non_max_merge(predictions, masks, iou_threshold=0.3) + groups_compact = mask_non_max_merge(predictions, cm, iou_threshold=0.3) + + def normalise(groups: list[list[int]]) -> list[list[int]]: + return sorted(sorted(group) for group in groups) + + assert normalise(groups_compact) == normalise(groups_dense) + + def test_nmm_no_merge(self) -> None: + """Non-overlapping masks: every mask should be its own group.""" + img_h, img_w = 20, 20 + masks = np.zeros((3, img_h, img_w), dtype=bool) + masks[0, 0:5, 0:5] = True + masks[1, 7:12, 7:12] = True + masks[2, 14:19, 14:19] = True + + scores = np.array([0.9, 0.8, 0.7]) + predictions = np.column_stack([np.zeros((3, 4)), scores]) + cm = _cm_from_masks(masks, (img_h, img_w)) + + groups = mask_non_max_merge(predictions, cm, iou_threshold=0.5) + assert len(groups) == 3, "Each non-overlapping mask gets its own group" + assert all(len(group) == 1 for group in groups) + + def test_nmm_full_merge(self) -> None: + """Identical masks: all predictions should merge into one group.""" + img_h, img_w = 20, 20 + single = np.zeros((1, img_h, img_w), dtype=bool) + single[0, 5:15, 5:15] = True + masks = np.repeat(single, 3, axis=0) + + scores = np.array([0.9, 0.8, 0.7]) + predictions = np.column_stack([np.zeros((3, 4)), scores]) + cm = _cm_from_masks(masks, (img_h, img_w)) + + groups = mask_non_max_merge(predictions, cm, iou_threshold=0.5) + assert len(groups) == 1, "Identical masks must collapse to one group" + assert len(groups[0]) == 3 + + +# --------------------------------------------------------------------------- +# Random scenario helpers +# --------------------------------------------------------------------------- + +# Small (N, h, w) configs to keep IoU tests fast. +_IOU_RANDOM_CONFIGS = [ + (5, 30, 30), + (8, 40, 40), + (10, 25, 25), + (6, 50, 50), + (12, 30, 40), + (5, 60, 60), + (15, 20, 20), + (7, 35, 35), + (10, 40, 50), + (8, 45, 45), +] + + +def _random_masks( + rng: np.random.Generator, + num_masks: int, + img_h: int, + img_w: int, + fill_prob: float = 0.25, +) -> np.ndarray: + """Generate *num_masks* random boolean masks with at least one True pixel each.""" + masks = np.zeros((num_masks, img_h, img_w), dtype=bool) + for mask_idx in range(num_masks): + y1 = rng.integers(0, img_h) + y2 = rng.integers(y1, img_h) + x1 = rng.integers(0, img_w) + x2 = rng.integers(x1, img_w) + region = rng.random((y2 - y1 + 1, x2 - x1 + 1)) < fill_prob + if not region.any(): + region[0, 0] = True + masks[mask_idx, y1 : y2 + 1, x1 : x2 + 1] = region + return masks + + +class TestCompactMaskIouRandom: + """compact_mask_iou_batch matches dense mask_iou_batch across 10 random seeds. + + Uses small mask counts (5-15) and image sizes (20x20 to 60x60) to keep + individual test runs under 1 second. + """ + + @pytest.mark.parametrize("seed", list(range(10))) + def test_parity_seed(self, seed: int) -> None: + rng = np.random.default_rng(seed) + num_masks_a, img_h, img_w = _IOU_RANDOM_CONFIGS[seed] + num_masks_b = max(3, num_masks_a - 2) + + masks_a = _random_masks(rng, num_masks_a, img_h, img_w) + masks_b = _random_masks(rng, num_masks_b, img_h, img_w) + + cm_a = _cm_from_masks(masks_a, (img_h, img_w)) + cm_b = _cm_from_masks(masks_b, (img_h, img_w)) + + compact_result = compact_mask_iou_batch(cm_a, cm_b) + dense_result = _dense_iou(masks_a, masks_b) + + assert compact_result.shape == (num_masks_a, num_masks_b), ( + f"Shape mismatch: {compact_result.shape} vs ({num_masks_a}, {num_masks_b})" + ) + np.testing.assert_allclose( + compact_result, + dense_result, + atol=1e-9, + err_msg=f"IoU mismatch: seed={seed}, N_a={num_masks_a}, N_b={num_masks_b}", + ) + + @pytest.mark.parametrize("seed", list(range(10))) + def test_self_iou_diagonal(self, seed: int) -> None: + """Self-IoU diagonal must be 1.0 for masks with at least one True pixel.""" + rng = np.random.default_rng(seed + 50) + num_masks, img_h, img_w = _IOU_RANDOM_CONFIGS[seed] + masks = _random_masks(rng, num_masks, img_h, img_w) + + cm = _cm_from_masks(masks, (img_h, img_w)) + result = compact_mask_iou_batch(cm, cm) + + np.testing.assert_allclose( + np.diag(result), + 1.0, + atol=1e-9, + err_msg=f"Diagonal not 1.0 for seed={seed}", + ) + + @pytest.mark.parametrize("seed", list(range(10))) + def test_tight_bbox_parity(self, seed: int) -> None: + """Tight bounding boxes (mask_to_xyxy) must still produce identical IoU.""" + from supervision.detection.utils.converters import mask_to_xyxy + + rng = np.random.default_rng(seed + 200) + num_masks, img_h, img_w = _IOU_RANDOM_CONFIGS[seed] + num_masks_b = max(3, num_masks - 2) + + masks_a = _random_masks(rng, num_masks, img_h, img_w) + masks_b = _random_masks(rng, num_masks_b, img_h, img_w) + + xyxy_a = mask_to_xyxy(masks_a).astype(np.float32) + xyxy_b = mask_to_xyxy(masks_b).astype(np.float32) + + cm_a = CompactMask.from_dense(masks_a, xyxy_a, image_shape=(img_h, img_w)) + cm_b = CompactMask.from_dense(masks_b, xyxy_b, image_shape=(img_h, img_w)) + + compact_result = compact_mask_iou_batch(cm_a, cm_b) + dense_result = _dense_iou(masks_a, masks_b) + + np.testing.assert_allclose( + compact_result, + dense_result, + atol=1e-9, + err_msg=f"Tight bbox IoU mismatch for seed={seed}", + ) diff --git a/tests/detection/test_inference_slicer_compact.py b/tests/detection/test_inference_slicer_compact.py new file mode 100644 index 0000000000..4a4a3e5f3a --- /dev/null +++ b/tests/detection/test_inference_slicer_compact.py @@ -0,0 +1,162 @@ +"""Integration tests for InferenceSlicer with compact_masks=True. + +Verifies that with compact_masks=True: +- Masks stay as CompactMask throughout the pipeline (no dense materialisation). +- NMS is computed via RLE IoU (no resize, no dense (N,H,W) alloc). +- Final detections are pixel-identical to the compact_masks=False path. +""" + +from __future__ import annotations + +import numpy as np + +import supervision as sv +from supervision.detection.compact_mask import CompactMask +from supervision.detection.core import Detections + + +def _fake_seg_callback(tile: np.ndarray) -> Detections: + """Return two non-overlapping segmentation detections for any tile.""" + h, w = tile.shape[:2] + masks = np.zeros((2, h, w), dtype=bool) + masks[0, : h // 3, : w // 3] = True + masks[1, h // 2 :, w // 2 :] = True + xyxy = np.array([[0, 0, w // 3, h // 3], [w // 2, h // 2, w, h]], dtype=np.float32) + return Detections( + xyxy=xyxy, + mask=masks, + confidence=np.array([0.9, 0.8], dtype=np.float32), + class_id=np.array([0, 1]), + ) + + +class TestInferenceSlicerCompactMasks: + """Tests that compact_masks=True keeps masks in RLE form end-to-end. + + The pipeline inside InferenceSlicer goes: + callback → CompactMask.from_dense (tile coords) + → with_offset (full-image coords) + → CompactMask.merge (all tiles) + → mask_non_max_suppression → compact_mask_iou_batch (RLE IoU) + + None of those steps materialise a full (N, H, W) dense array. + """ + + def test_compact_masks_flag_converts_dense_to_compact(self) -> None: + """Masks returned from callback are CompactMask after _run_callback.""" + image = np.zeros((200, 200, 3), dtype=np.uint8) + slicer = sv.InferenceSlicer( + callback=_fake_seg_callback, + slice_wh=200, + overlap_wh=0, + overlap_filter=sv.OverlapFilter.NONE, + compact_masks=True, + ) + result = slicer(image) + assert isinstance(result.mask, CompactMask), ( + f"compact_masks=True must produce a CompactMask, got {type(result.mask)}" + ) + + def test_compact_masks_false_keeps_dense(self) -> None: + """Default (compact_masks=False) keeps dense ndarray masks.""" + image = np.zeros((200, 200, 3), dtype=np.uint8) + slicer = sv.InferenceSlicer( + callback=_fake_seg_callback, + slice_wh=200, + overlap_wh=0, + overlap_filter=sv.OverlapFilter.NONE, + compact_masks=False, + ) + result = slicer(image) + assert isinstance(result.mask, np.ndarray) + assert not isinstance(result.mask, CompactMask) + + def test_compact_and_dense_pipelines_give_same_masks(self) -> None: + """compact_masks=True and False must produce pixel-identical final masks.""" + image = np.zeros((300, 300, 3), dtype=np.uint8) + + slicer_dense = sv.InferenceSlicer( + callback=_fake_seg_callback, + slice_wh=150, + overlap_wh=0, + overlap_filter=sv.OverlapFilter.NON_MAX_SUPPRESSION, + iou_threshold=0.3, + compact_masks=False, + ) + slicer_compact = sv.InferenceSlicer( + callback=_fake_seg_callback, + slice_wh=150, + overlap_wh=0, + overlap_filter=sv.OverlapFilter.NON_MAX_SUPPRESSION, + iou_threshold=0.3, + compact_masks=True, + ) + + det_dense = slicer_dense(image) + det_compact = slicer_compact(image) + + assert len(det_dense) == len(det_compact) + + dense_masks = det_dense.mask + compact_masks_arr = np.asarray(det_compact.mask) + + # Sort both by xyxy to align order (NMS order may differ). + def _sort_key(d: Detections) -> np.ndarray: + return d.xyxy[:, 0] * 10000 + d.xyxy[:, 1] + + order_d = np.argsort(_sort_key(det_dense)) + order_c = np.argsort(_sort_key(det_compact)) + + np.testing.assert_array_equal( + dense_masks[order_d], + compact_masks_arr[order_c], + err_msg="compact_masks pipeline produced different mask pixels than dense", + ) + + def test_nms_with_overlapping_tiles_uses_rle_iou(self) -> None: + """With overlapping tiles, NMS must suppress duplicates using RLE IoU.""" + image = np.zeros((300, 300, 3), dtype=np.uint8) + + call_count = 0 + + def counting_callback(tile: np.ndarray) -> Detections: + nonlocal call_count + call_count += 1 + return _fake_seg_callback(tile) + + slicer = sv.InferenceSlicer( + callback=counting_callback, + slice_wh=200, + overlap_wh=100, # heavy overlap → many duplicate detections + overlap_filter=sv.OverlapFilter.NON_MAX_SUPPRESSION, + iou_threshold=0.3, + compact_masks=True, + ) + result = slicer(image) + + assert call_count > 1, "Should have run on multiple tiles" + assert isinstance(result.mask, CompactMask), ( + "Result mask must remain CompactMask after cross-tile NMS" + ) + + def test_no_mask_callback_unaffected(self) -> None: + """compact_masks=True must not crash when callback returns no masks.""" + + def box_only_callback(tile: np.ndarray) -> Detections: + h, w = tile.shape[:2] + return Detections( + xyxy=np.array([[0, 0, w // 2, h // 2]], dtype=np.float32), + confidence=np.array([0.9]), + class_id=np.array([0]), + ) + + image = np.zeros((200, 200, 3), dtype=np.uint8) + slicer = sv.InferenceSlicer( + callback=box_only_callback, + slice_wh=200, + overlap_wh=0, + overlap_filter=sv.OverlapFilter.NONE, + compact_masks=True, + ) + result = slicer(image) + assert result.mask is None