LuxonisEval

🌟 Overview

LuxonisEval is a modular evaluation framework for benchmarking neural network models across multiple inference backends. It supports inference on Luxonis devices (RVC2 and RVC4) through DepthAI, as well as host-side inference through ONNX Runtime, while reporting both quality metrics and throughput or latency performance.

The framework follows a registry-based architecture: each pluggable component (engines, dataloaders, parsers, metrics, and visualizers) registers itself automatically. This lets you swap, extend, or add parts of the evaluation pipeline without modifying the core evaluation loop. In practice, adding a new component usually means subclassing the appropriate base class and referencing it by name in the configuration.

✨ Key Features

Multiple Inference Backends
- DepthAI Engine - Run models exported as NNArchive files on Luxonis devices via DepthAI
- ONNX Engine - Run models on CPU or GPU using ONNX Runtime
Dataset Loading
- LuxonisLoader - Load datasets stored in Luxonis Data Format (LDF)
- BaseEvalLoader - Base class for custom dataloaders
Supported Tasks
- Classification - Image classification
- Detection - Bounding box detection
- SemanticSegmentation - Per-pixel class labeling
- InstanceSegmentation - Per-instance masks with detection
- KeypointDetection - Body or object keypoint localization
Built-In Metrics
- TopKAccuracy - Top-1 and Top-5 accuracy for classification
- BboxMeanAveragePrecision - COCO-style mAP for bounding box detection
- MaskMeanAveragePrecision - COCO-style mAP for instance segmentation
- KeypointMeanAveragePrecision - OKS-based mAP for keypoint detection
- MIoU - Mean Intersection over Union for semantic segmentation
- DiceCoefficient - Dice score for semantic segmentation
- ThroughputMetric - End-to-end throughput and latency reporting
Extensible Architecture - Registry-based design powered by AutoRegisterMeta, making it straightforward to add custom engines, parsers, metrics, loaders, and visualizers

🚀 Quick Start

Get started with LuxonisEval in a few steps:

Install the project from source
```
pip install .
```

Prepare the example model and dataset (requires the fiftyone package)

pip install fiftyone
bash examples/quickstart_inst_seg/setup_example.sh

Run the evaluation

luxonis_eval eval --config configs/yolov8n_inst_seg_config.yaml

This quickstart runs instance segmentation evaluation with ONNX Runtime on CPU and does not require Luxonis hardware. For a fuller walkthrough, see examples/quickstart_inst_seg/README.md.

🛠️ Installation

LuxonisEval requires Python 3.10 or higher. We recommend using a virtual environment to keep dependencies isolated.

Install from source:

pip install .

This installs the luxonis_eval CLI in your environment.

Developer install:

pip install -e ".[dev]"

📝 Usage

You can use LuxonisEval either from the command line or through the Python API. The CLI is the primary entry point for running evaluations from configuration files.

💻 CLI

The CLI currently exposes the eval command:

luxonis_eval eval --help

Example invocations:

# Run evaluation with a config file
luxonis_eval eval --config path/to/config.yaml

# Run with CLI overrides
luxonis_eval eval \
    --config path/to/config.yaml \
    --dataset-name coco \
    --model-path path/to/model.tar.xz \
    --backend depthai

# Use the ONNX backend
luxonis_eval eval \
    --config path/to/config.yaml \
    --dataset-name coco \
    --model-path path/to/model.onnx \
    --backend onnx

# Specify device IP for RVC4
luxonis_eval eval \
    --config path/to/config.yaml \
    --device-ip 192.168.1.100

🐍 Python API

For programmatic usage, load an EvalConfig instance and pass it to eval_run:

from luxonis_eval.__main__ import eval_run
from luxonis_eval.utils.config import EvalConfig

eval_cfg = EvalConfig.get_config(cfg="path/to/config.yaml")
eval_run(eval_cfg)

🏗️ Architecture

The repository is organized around a small set of core component types:

luxonis_eval/
├── engines/          # Inference backends
├── loaders/          # Dataset loaders
├── metrics/          # Evaluation metrics
├── parsers/          # Model output parsers
├── utils/            # Configuration and helper functions
├── visualizers/      # Result visualization
└── metadata/         # Class mapping files

🧩 Key Base Classes

Base Class	Location	Purpose
`BaseEngine`	`engines/`	Abstract inference engine
`BaseParser`	`parsers/`	Abstract output parser
`BaseMetric`	`metrics/`	Abstract evaluation metric
`BaseEvalLoader`	`loaders/`	Abstract dataset loader
`BaseVisualizer`	`visualizers/`	Abstract result visualizer

All base classes use the AutoRegisterMeta metaclass. Any subclass is registered automatically and becomes available by name in configuration files, with no manual wiring required.

🔄 Evaluation Pipeline

The evaluation loop in eval_run is structured around abstract component interfaces rather than concrete implementations. That design keeps the pipeline modular and makes backend or task-specific components easy to replace.

┌────────────┐     ┌─────────────┐     ┌─────────────┐     ┌───────────┐
│ DataLoader │────▶│    Engine   │────▶│   Parser    │────▶│  Metrics  │
│ (provides  │     │ (runs model │     │ (converts   │     │ (scores   │
│  samples)  │     │  inference) │     │  raw output)│     │  results) │
└────────────┘     └─────────────┘     └─────────────┘     └───────────┘
                                                                   │
                                              ┌────────────┐       │
                                              │ Visualizer │◀──────┘
                                              │ (optional) │
                                              └────────────┘

The pipeline works as follows:

DataLoader provides images together with ground-truth annotations.
Engine runs inference and returns raw backend outputs.
Parser converts raw outputs into a structured prediction format.
Metrics accumulate per-sample results and compute final scores.
Visualizer optionally renders predictions for inspection.

Because each component is resolved from a registry at runtime, you can mix and match implementations freely. For example, you can:

swap depthai for onnx in engine without changing the rest of the config
add another metric under metrics.metrics
introduce a custom parser and reference it by name
replace LuxonisLoader with a dataset-specific custom loader

The main constraint is compatibility: the parser must produce predictions in the format the configured metrics expect, and the dataloader must provide the annotation keys those metrics require. BaseMetric.validate_target_keys() catches mismatches early and raises a clear error message.

📊 Throughput Metric Semantics

ThroughputMetric measures end-to-end pipeline timing rather than isolated model-only benchmarks. The reported rows mean:

Warning

Throughput values are end-to-end pipeline measurements and not isolated model-only benchmarks. Lower numbers than modelconverter benchmark results are expected.

Throughput - Samples processed per second across the full evaluation pipeline
End-to-end Latency - Average wall-clock time per sample for the whole run
Inference - Time spent inside the inference engine
Parsing - Time spent converting raw model outputs into predictions
Metric Update - Time spent updating metrics for each sample
Metric Compute - Time spent in the final metric aggregation after the sample loop
Pipeline Overhead - Remaining time not covered by the rows above; this typically includes dataloader iteration, image decode, preprocessing such as resize or normalization, annotation reconstruction, visualization, progress bar updates, and general loop bookkeeping

Rule of thumb: End-to-end Latency ≈ Inference + Parsing + Metric Update + Metric Compute + Pipeline Overhead

⚙️ Configuration

Evaluation runs are driven by a YAML configuration file. EvalConfig parses and validates the configuration at startup, ensuring that referenced components exist and that required fields are present before evaluation begins.

A complete configuration file is typically organized into the sections below.

📦 Data Loading And Preprocessing

This section defines which dataloader to use, which dataset it points to, and which preprocessing steps are applied before inference.

loader:
  name: LuxonisLoader             # Registered dataloader name
  params:
    dataset_name: coco-2017       # Dataset identifier
    view: [val]                   # Dataset split(s) to use
  preprocessing:
    normalize:
      active: true                # Whether to apply normalization
      params:
        mean: [0.485, 0.456, 0.406]
        std: [0.229, 0.224, 0.225]
    color_space: RGB              # RGB | BGR | GRAY
    keep_aspect_ratio: false      # Preserve aspect ratio during resize

Note

When using the depthai backend, normalization is usually handled by the model's own preprocessing pipeline. The engine will warn you if normalization is enabled together with DepthAI. DepthAI also expects BGR color space, so a warning is emitted if RGB is selected.

🧠 Output Parser

The parser converts raw model outputs into structured predictions. Different model architectures expose different tensor layouts, so the parser is responsible for translating backend-specific outputs into a format the metrics can consume.

parser:
  name: YOLOInstanceSegmentationParser
  params:
    conf_thres: 0.25
    mask_thres: 0.25
    iou_thres: 0.45

📏 Evaluation Metrics

Metrics are instantiated independently, updated for each sample, and computed at the end of the run. Throughput reporting is added automatically.

metrics:
  metrics:
    - name: BboxMeanAveragePrecision
      params:
        iou_type: bbox
    - name: MaskMeanAveragePrecision
      params:
        iou_type: segm

🎨 Visualization

Visualization is optional and can be enabled when you want to inspect predictions during the evaluation loop.

visualizer:
  name: InstanceSegmentationVisualizer
  visualize: true
  params: {}

⚡ Inference Engine

The engine section selects the backend and points to the model file. Configuration validation ensures that the model format matches the backend (.tar.xz for depthai, .onnx for onnx).

engine:
  name: onnx                      # Registered engine name: onnx | depthai
  model_path: ./models/yolov11n/yolov11n.onnx
  params: {}                      # Engine-specific parameters, for example device_ip for RVC4

📄 Full Example

loader:
  name: LuxonisLoader
  params:
    dataset_name: coco-2017
    view: [val]
  preprocessing:
    normalize:
      active: false
    color_space: BGR
    keep_aspect_ratio: false

parser:
  name: YOLOInstanceSegmentationParser
  params:
    conf_thres: 0.25
    mask_thres: 0.25
    iou_thres: 0.45

metrics:
  metrics:
    - name: BboxMeanAveragePrecision
      params:
        iou_type: bbox
    - name: MaskMeanAveragePrecision
      params:
        iou_type: segm

engine:
  name: depthai
  model_path: ./models/yolov11n-seg.rvc4.tar.xz
  params:
    device_ip: 192.168.1.100

🧱 Extending the Framework

LuxonisEval is designed around a simple rule: implement a new class that inherits from the appropriate base class, and the registry handles the rest. Every component type (BaseEngine, BaseEvalLoader, BaseParser, BaseMetric, BaseVisualizer) uses AutoRegisterMeta, so subclassing is enough to make a component available once its module is imported.

📥 Adding a Custom DataLoader

Every custom loader must inherit from BaseEvalLoader and implement four abstract methods:

load_classes() - Returns a dict[str, int] mapping class names to integer indices. The result is assigned to self.classes and validated automatically.
get_class_mapping() - Returns a tuple of (ldf_class_map, native_class_map, class_index_map):
- LDF class map (dict[int, str]): class ordering used inside Luxonis Data Format
- Native class map (dict[int, str]): original class ordering used during training
- Class index map (dict[int, int]): mapping from LDF indices to native indices
__getitem__(idx) - Returns a LoaderOutput tuple for the requested sample
__len__() - Returns the number of samples in the dataset

For LuxonisLoader-backed datasets, the LDF and native class maps often differ, so the class index map must encode the remapping explicitly. For custom datasets that inherit directly from BaseEvalLoader, the two class maps are usually identical and the class index map is typically an identity mapping.

Important

__getitem__ must return LoaderOutput from luxonis_ml.typing, which is a tuple of (image, annotations_dict).

image (np.ndarray) is a single image, for example with shape (H, W, 3).
annotations_dict (dict[str, np.ndarray]) maps task-group annotation keys to arrays, such as "/boundingbox", "/classification", or "/segmentation".

Every subclass implementation of __getitem__ is wrapped by @validate_loader_output, which calls check_loader_output at runtime and raises a descriptive TypeError if the output format is invalid.

🔌 Adding a Custom Engine

Subclass BaseEngine and implement the six abstract methods:

setup() - Initialize backend resources such as runtimes, sessions, or device connections
get_input_shape() - Return the model input size as a (width, height) tuple
get_platform_name() - Return a human-readable platform name such as "RVC2" or "RVC4"
infer_once(img) - Run inference on a single preprocessed image and return the raw backend output
vis_frame() - Return a copy of the input image suitable for visualization overlays
teardown() - Release backend resources after evaluation finishes

🧠 Adding a Custom Parser

Subclass BaseParser and implement the single abstract method:

parse(raw_output, **kwargs) - Convert raw backend output into a structured prediction format

The parser bridges the gap between model-specific tensor layouts and the standardized message types that downstream metrics expect. The built-in parsers produce the following output types:

Important

The parser must produce outputs that the configured metrics can consume. For example, if a metric expects dai.ImgDetections, the parser must return that message type.

📐 Adding a Custom Metric

Subclass BaseMetric and implement the four abstract methods:

metric_keys() - Declare which annotation keys the metric requires
_reset_impl() - Reset internal state such as counters or accumulators
_update_impl(predictions, target, **kwargs) - Update the metric state for one sample
_compute_impl() - Return the final metric value

Important

Metrics must be compatible with the outputs generated by the configured parser. If the parser returns dai.ImgDetections, the metric must know how to process that object.

🪜 General Pattern

All extensions follow the same three-step workflow:

Subclass the appropriate base class
Implement the required abstract methods
Reference the component by name in the YAML config

No manual registration, factory wiring, or extra boilerplate is required. As long as the module is imported, the metaclass makes the class available.

📄 License

This project is licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
configs		configs
examples/quickstart_inst_seg		examples/quickstart_inst_seg
luxonis_eval		luxonis_eval
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LuxonisEval

🌟 Overview

✨ Key Features

🚀 Quick Start

Table Of Contents

🛠️ Installation

📝 Usage

💻 CLI

🐍 Python API

🏗️ Architecture

🧩 Key Base Classes

🔄 Evaluation Pipeline

📊 Throughput Metric Semantics

⚙️ Configuration

📦 Data Loading And Preprocessing

🧠 Output Parser

📏 Evaluation Metrics

🎨 Visualization

⚡ Inference Engine

📄 Full Example

🧱 Extending the Framework

📥 Adding a Custom DataLoader

🔌 Adding a Custom Engine

🧠 Adding a Custom Parser

📐 Adding a Custom Metric

🪜 General Pattern

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LuxonisEval

🌟 Overview

✨ Key Features

🚀 Quick Start

Table Of Contents

🛠️ Installation

📝 Usage

💻 CLI

🐍 Python API

🏗️ Architecture

🧩 Key Base Classes

🔄 Evaluation Pipeline

📊 Throughput Metric Semantics

⚙️ Configuration

📦 Data Loading And Preprocessing

🧠 Output Parser

📏 Evaluation Metrics

🎨 Visualization

⚡ Inference Engine

📄 Full Example

🧱 Extending the Framework

📥 Adding a Custom DataLoader

🔌 Adding a Custom Engine

🧠 Adding a Custom Parser

📐 Adding a Custom Metric

🪜 General Pattern

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages