Skip to content

luxonis/luxonis-eval

Repository files navigation

LuxonisEval

License

🌟 Overview

LuxonisEval is a modular evaluation framework for benchmarking neural network models across multiple inference backends. It supports inference on Luxonis devices (RVC2 and RVC4) through DepthAI, as well as host-side inference through ONNX Runtime, while reporting both quality metrics and throughput or latency performance.

The framework follows a registry-based architecture: each pluggable component (engines, dataloaders, parsers, metrics, and visualizers) registers itself automatically. This lets you swap, extend, or add parts of the evaluation pipeline without modifying the core evaluation loop. In practice, adding a new component usually means subclassing the appropriate base class and referencing it by name in the configuration.

✨ Key Features

  • Multiple Inference Backends
  • Dataset Loading
  • Supported Tasks
    • Classification - Image classification
    • Detection - Bounding box detection
    • SemanticSegmentation - Per-pixel class labeling
    • InstanceSegmentation - Per-instance masks with detection
    • KeypointDetection - Body or object keypoint localization
  • Built-In Metrics
  • Extensible Architecture - Registry-based design powered by AutoRegisterMeta, making it straightforward to add custom engines, parsers, metrics, loaders, and visualizers

🚀 Quick Start

Get started with LuxonisEval in a few steps:

  1. Install the project from source

    pip install .
  2. Prepare the example model and dataset (requires the fiftyone package)

    pip install fiftyone
    bash examples/quickstart_inst_seg/setup_example.sh
  3. Run the evaluation

    luxonis_eval eval --config configs/yolov8n_inst_seg_config.yaml

This quickstart runs instance segmentation evaluation with ONNX Runtime on CPU and does not require Luxonis hardware. For a fuller walkthrough, see examples/quickstart_inst_seg/README.md.

Table Of Contents

🛠️ Installation

LuxonisEval requires Python 3.10 or higher. We recommend using a virtual environment to keep dependencies isolated.

Install from source:

pip install .

This installs the luxonis_eval CLI in your environment.

Developer install:

pip install -e ".[dev]"

📝 Usage

You can use LuxonisEval either from the command line or through the Python API. The CLI is the primary entry point for running evaluations from configuration files.

💻 CLI

The CLI currently exposes the eval command:

luxonis_eval eval --help

Example invocations:

# Run evaluation with a config file
luxonis_eval eval --config path/to/config.yaml

# Run with CLI overrides
luxonis_eval eval \
    --config path/to/config.yaml \
    --dataset-name coco \
    --model-path path/to/model.tar.xz \
    --backend depthai

# Use the ONNX backend
luxonis_eval eval \
    --config path/to/config.yaml \
    --dataset-name coco \
    --model-path path/to/model.onnx \
    --backend onnx

# Specify device IP for RVC4
luxonis_eval eval \
    --config path/to/config.yaml \
    --device-ip 192.168.1.100

🐍 Python API

For programmatic usage, load an EvalConfig instance and pass it to eval_run:

from luxonis_eval.__main__ import eval_run
from luxonis_eval.utils.config import EvalConfig

eval_cfg = EvalConfig.get_config(cfg="path/to/config.yaml")
eval_run(eval_cfg)

🏗️ Architecture

The repository is organized around a small set of core component types:

luxonis_eval/
├── engines/          # Inference backends
├── loaders/          # Dataset loaders
├── metrics/          # Evaluation metrics
├── parsers/          # Model output parsers
├── utils/            # Configuration and helper functions
├── visualizers/      # Result visualization
└── metadata/         # Class mapping files

🧩 Key Base Classes

Base Class Location Purpose
BaseEngine engines/ Abstract inference engine
BaseParser parsers/ Abstract output parser
BaseMetric metrics/ Abstract evaluation metric
BaseEvalLoader loaders/ Abstract dataset loader
BaseVisualizer visualizers/ Abstract result visualizer

All base classes use the AutoRegisterMeta metaclass. Any subclass is registered automatically and becomes available by name in configuration files, with no manual wiring required.

🔄 Evaluation Pipeline

The evaluation loop in eval_run is structured around abstract component interfaces rather than concrete implementations. That design keeps the pipeline modular and makes backend or task-specific components easy to replace.

┌────────────┐     ┌─────────────┐     ┌─────────────┐     ┌───────────┐
│ DataLoader │────▶│    Engine   │────▶│   Parser    │────▶│  Metrics  │
│ (provides  │     │ (runs model │     │ (converts   │     │ (scores   │
│  samples)  │     │  inference) │     │  raw output)│     │  results) │
└────────────┘     └─────────────┘     └─────────────┘     └───────────┘
                                                                   │
                                              ┌────────────┐       │
                                              │ Visualizer │◀──────┘
                                              │ (optional) │
                                              └────────────┘

The pipeline works as follows:

  1. DataLoader provides images together with ground-truth annotations.
  2. Engine runs inference and returns raw backend outputs.
  3. Parser converts raw outputs into a structured prediction format.
  4. Metrics accumulate per-sample results and compute final scores.
  5. Visualizer optionally renders predictions for inspection.

Because each component is resolved from a registry at runtime, you can mix and match implementations freely. For example, you can:

  • swap depthai for onnx in engine without changing the rest of the config
  • add another metric under metrics.metrics
  • introduce a custom parser and reference it by name
  • replace LuxonisLoader with a dataset-specific custom loader

The main constraint is compatibility: the parser must produce predictions in the format the configured metrics expect, and the dataloader must provide the annotation keys those metrics require. BaseMetric.validate_target_keys() catches mismatches early and raises a clear error message.

📊 Throughput Metric Semantics

ThroughputMetric measures end-to-end pipeline timing rather than isolated model-only benchmarks. The reported rows mean:

Warning

Throughput values are end-to-end pipeline measurements and not isolated model-only benchmarks. Lower numbers than modelconverter benchmark results are expected.

  • Throughput - Samples processed per second across the full evaluation pipeline
  • End-to-end Latency - Average wall-clock time per sample for the whole run
  • Inference - Time spent inside the inference engine
  • Parsing - Time spent converting raw model outputs into predictions
  • Metric Update - Time spent updating metrics for each sample
  • Metric Compute - Time spent in the final metric aggregation after the sample loop
  • Pipeline Overhead - Remaining time not covered by the rows above; this typically includes dataloader iteration, image decode, preprocessing such as resize or normalization, annotation reconstruction, visualization, progress bar updates, and general loop bookkeeping

Rule of thumb: End-to-end Latency ≈ Inference + Parsing + Metric Update + Metric Compute + Pipeline Overhead

⚙️ Configuration

Evaluation runs are driven by a YAML configuration file. EvalConfig parses and validates the configuration at startup, ensuring that referenced components exist and that required fields are present before evaluation begins.

A complete configuration file is typically organized into the sections below.

📦 Data Loading And Preprocessing

This section defines which dataloader to use, which dataset it points to, and which preprocessing steps are applied before inference.

loader:
  name: LuxonisLoader             # Registered dataloader name
  params:
    dataset_name: coco-2017       # Dataset identifier
    view: [val]                   # Dataset split(s) to use
  preprocessing:
    normalize:
      active: true                # Whether to apply normalization
      params:
        mean: [0.485, 0.456, 0.406]
        std: [0.229, 0.224, 0.225]
    color_space: RGB              # RGB | BGR | GRAY
    keep_aspect_ratio: false      # Preserve aspect ratio during resize

Note

When using the depthai backend, normalization is usually handled by the model's own preprocessing pipeline. The engine will warn you if normalization is enabled together with DepthAI. DepthAI also expects BGR color space, so a warning is emitted if RGB is selected.

🧠 Output Parser

The parser converts raw model outputs into structured predictions. Different model architectures expose different tensor layouts, so the parser is responsible for translating backend-specific outputs into a format the metrics can consume.

parser:
  name: YOLOInstanceSegmentationParser
  params:
    conf_thres: 0.25
    mask_thres: 0.25
    iou_thres: 0.45

📏 Evaluation Metrics

Metrics are instantiated independently, updated for each sample, and computed at the end of the run. Throughput reporting is added automatically.

metrics:
  metrics:
    - name: BboxMeanAveragePrecision
      params:
        iou_type: bbox
    - name: MaskMeanAveragePrecision
      params:
        iou_type: segm

🎨 Visualization

Visualization is optional and can be enabled when you want to inspect predictions during the evaluation loop.

visualizer:
  name: InstanceSegmentationVisualizer
  visualize: true
  params: {}

⚡ Inference Engine

The engine section selects the backend and points to the model file. Configuration validation ensures that the model format matches the backend (.tar.xz for depthai, .onnx for onnx).

engine:
  name: onnx                      # Registered engine name: onnx | depthai
  model_path: ./models/yolov11n/yolov11n.onnx
  params: {}                      # Engine-specific parameters, for example device_ip for RVC4

📄 Full Example

loader:
  name: LuxonisLoader
  params:
    dataset_name: coco-2017
    view: [val]
  preprocessing:
    normalize:
      active: false
    color_space: BGR
    keep_aspect_ratio: false

parser:
  name: YOLOInstanceSegmentationParser
  params:
    conf_thres: 0.25
    mask_thres: 0.25
    iou_thres: 0.45

metrics:
  metrics:
    - name: BboxMeanAveragePrecision
      params:
        iou_type: bbox
    - name: MaskMeanAveragePrecision
      params:
        iou_type: segm

engine:
  name: depthai
  model_path: ./models/yolov11n-seg.rvc4.tar.xz
  params:
    device_ip: 192.168.1.100

🧱 Extending the Framework

LuxonisEval is designed around a simple rule: implement a new class that inherits from the appropriate base class, and the registry handles the rest. Every component type (BaseEngine, BaseEvalLoader, BaseParser, BaseMetric, BaseVisualizer) uses AutoRegisterMeta, so subclassing is enough to make a component available once its module is imported.

📥 Adding a Custom DataLoader

Every custom loader must inherit from BaseEvalLoader and implement four abstract methods:

  • load_classes() - Returns a dict[str, int] mapping class names to integer indices. The result is assigned to self.classes and validated automatically.
  • get_class_mapping() - Returns a tuple of (ldf_class_map, native_class_map, class_index_map):
    • LDF class map (dict[int, str]): class ordering used inside Luxonis Data Format
    • Native class map (dict[int, str]): original class ordering used during training
    • Class index map (dict[int, int]): mapping from LDF indices to native indices
  • __getitem__(idx) - Returns a LoaderOutput tuple for the requested sample
  • __len__() - Returns the number of samples in the dataset

For LuxonisLoader-backed datasets, the LDF and native class maps often differ, so the class index map must encode the remapping explicitly. For custom datasets that inherit directly from BaseEvalLoader, the two class maps are usually identical and the class index map is typically an identity mapping.

Important

__getitem__ must return LoaderOutput from luxonis_ml.typing, which is a tuple of (image, annotations_dict).

  • image (np.ndarray) is a single image, for example with shape (H, W, 3).
  • annotations_dict (dict[str, np.ndarray]) maps task-group annotation keys to arrays, such as "/boundingbox", "/classification", or "/segmentation".

Every subclass implementation of __getitem__ is wrapped by @validate_loader_output, which calls check_loader_output at runtime and raises a descriptive TypeError if the output format is invalid.

🔌 Adding a Custom Engine

Subclass BaseEngine and implement the six abstract methods:

  • setup() - Initialize backend resources such as runtimes, sessions, or device connections
  • get_input_shape() - Return the model input size as a (width, height) tuple
  • get_platform_name() - Return a human-readable platform name such as "RVC2" or "RVC4"
  • infer_once(img) - Run inference on a single preprocessed image and return the raw backend output
  • vis_frame() - Return a copy of the input image suitable for visualization overlays
  • teardown() - Release backend resources after evaluation finishes

🧠 Adding a Custom Parser

Subclass BaseParser and implement the single abstract method:

  • parse(raw_output, **kwargs) - Convert raw backend output into a structured prediction format

The parser bridges the gap between model-specific tensor layouts and the standardized message types that downstream metrics expect. The built-in parsers produce the following output types:

Important

The parser must produce outputs that the configured metrics can consume. For example, if a metric expects dai.ImgDetections, the parser must return that message type.

📐 Adding a Custom Metric

Subclass BaseMetric and implement the four abstract methods:

  • metric_keys() - Declare which annotation keys the metric requires
  • _reset_impl() - Reset internal state such as counters or accumulators
  • _update_impl(predictions, target, **kwargs) - Update the metric state for one sample
  • _compute_impl() - Return the final metric value

Important

Metrics must be compatible with the outputs generated by the configured parser. If the parser returns dai.ImgDetections, the metric must know how to process that object.

🪜 General Pattern

All extensions follow the same three-step workflow:

  1. Subclass the appropriate base class
  2. Implement the required abstract methods
  3. Reference the component by name in the YAML config

No manual registration, factory wiring, or extra boilerplate is required. As long as the module is imported, the metaclass makes the class available.

📄 License

This project is licensed under the Apache License 2.0.

About

A modular evaluation framework for benchmarking and analyzing deep learning models, designed to be easily extensible to new tasks and metrics, and to work seamlessly with the Luxonis ecosystem (DepthAI, Luxonis ML).

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages