NVIDIA · jioffe502 · Mar 23, 2026
@@ -395,3 +395,9 @@ retriever-harness sweep --runs-config harness/vidore_sweep.yaml
 ```
 
 The same commands also work under the main CLI as `retriever harness ...` if you prefer a single top-level command namespace.
+
+The harness now supports multiple execution modes through a shared structured metrics contract:
+
+- `run_mode: batch` is the default path.
+- `run_mode: inprocess` and `run_mode: fused` use the same `results.json` / `session_summary.json` schema.
+- Per-run structured metrics are written under `runtime_metrics/<run>.run_report.json`, and the harness derives its compact summary views from that report rather than scraping console output.
@@ -9,25 +9,30 @@ It captures what exists now, what was intentionally chosen, and what to iterate
 ## Current Scope and Intent
 
 - Harness is standalone under `nemo_retriever` (not based on `tools/harness`).
-- It wraps `nemo_retriever.examples.batch_pipeline`.
+- It now executes shared run-mode runners directly instead of scraping CLI output.
+- `batch` remains the default run mode, with `inprocess` and `fused` supported through the same harness config surface.
 - Primary use case is benchmark orchestration for local/cluster runs without Docker orchestration.
 - Vector DB is LanceDB only.
 - Recall gating is supported and enforced by config (`recall_required`).
 
 ## Key Files
 
 - `nemo_retriever/src/nemo_retriever/harness/run.py`
-  - CLI run/sweep/nightly orchestration, subprocess execution, metrics extraction, artifact writes.
+  - CLI run/sweep/nightly orchestration, run-mode dispatch, artifact writes.
 - `nemo_retriever/src/nemo_retriever/harness/config.py`
-  - YAML + CLI/env merge logic and `HarnessConfig`.
-- `nemo_retriever/src/nemo_retriever/harness/parsers.py`
-  - Stream parsing for ingest/throughput/recall metrics.
+  - YAML + CLI/env merge logic and `HarnessConfig`, including `run_mode`.
 - `nemo_retriever/src/nemo_retriever/harness/artifacts.py`
   - Artifact/session directory creation and session summary writing.
 - `nemo_retriever/src/nemo_retriever/harness/recall_adapters.py`
   - Dataset-specific query normalization adapters for recall inputs.
+- `nemo_retriever/src/nemo_retriever/application/modes/reports.py`
+  - Shared `RunReport` / `RunMetrics` schema and artifact persistence helpers.
+- `nemo_retriever/src/nemo_retriever/application/modes/run_batch.py`
+- `nemo_retriever/src/nemo_retriever/application/modes/run_inprocess.py`
+- `nemo_retriever/src/nemo_retriever/application/modes/run_fused.py`
+  - Shared mode runners that return structured reports consumed by the harness.
 - `nemo_retriever/harness/test_configs.yaml`
-  - Active defaults, presets, dataset presets.
+  - Active defaults, presets, dataset presets, and default `run_mode`.
 - `nemo_retriever/harness/nightly_config.yaml`
   - Ordered run list for sweep/nightly.
 
@@ -48,7 +53,6 @@ It captures what exists now, what was intentionally chosen, and what to iterate
 From repo root:
 
 ```bash
-source ~/setup_env.sh
 source .retriever/bin/activate
 uv pip install -e ./nemo_retriever
 ```
@@ -92,6 +96,8 @@ Per run:
 - `command.txt`
 - `runtime_metrics/`
 - `lancedb/`
+- `runtime_metrics/<run>.run_report.json`
+- `runtime_metrics/<run>.runtime.summary.json`
 
 Session-level:
 
@@ -113,8 +119,9 @@ Notes:
    - Kept `session_summary.json`.
    - Removed `sweep_results.json` generation.
 
-3. **TTY-backed subprocess retained**
-   - Harness runs batch pipeline through a PTY so Ray progress remains rich/pretty by default.
+3. **Structured run reports are authoritative**
+   - Harness metrics are populated from `RunReport` objects returned by the mode runners.
+   - Console output is now presentation-only and is no longer scraped for harness metrics.
 
 ## Known Behavior to Remember
 
@@ -159,62 +166,41 @@ Harness-focused tests pass:
 
 ```bash
 pytest -q nemo_retriever/tests/test_batch_ingestor.py \
-  nemo_retriever/tests/test_batch_pipeline.py \
-  nemo_retriever/tests/test_harness_parsers.py \
   nemo_retriever/tests/test_harness_config.py \
   nemo_retriever/tests/test_harness_run.py \
   nemo_retriever/tests/test_harness_reporting.py \
+  nemo_retriever/tests/test_harness_nightly.py \
   nemo_retriever/tests/test_harness_recall_adapters.py \
   nemo_retriever/tests/test_recall_core.py
 ```
 
-## Upstream Batch Compatibility (Mar 2026)
-
-The upstream `nemo_retriever.examples.batch_pipeline` CLI and log output changed
-after the initial harness work landed. The harness now carries a compatibility
-shim for that newer upstream behavior.
-
-### CLI compatibility
-
-- Harness config field names remain unchanged for now.
-- `harness.run._build_command()` maps those fields to the newer public batch CLI
-  flags, including:
-  - `pdf_extract_workers` -> `--pdf-extract-tasks`
-  - `pdf_extract_num_cpus` -> `--pdf-extract-cpus-per-task`
-  - `page_elements_workers` -> `--page-elements-actors`
-  - `ocr_workers` -> `--ocr-actors`
-  - `embed_workers` -> `--embed-actors`
-  - `gpu_page_elements` -> `--page-elements-gpus-per-actor`
-  - `gpu_ocr` -> `--ocr-gpus-per-actor`
-  - `gpu_embed` -> `--embed-gpus-per-actor`
-
-### Artifact / parser semantics
-
-- Current upstream batch mode no longer emits the old plain `[done]` / `Pages/sec`
-  lines on the main ingest path.
-- Harness parsers now accept:
-  - the legacy plain-text format when present
-  - the newer logged line:
-    - `Ingestion complete. <rows> rows procesed in <secs> seconds. <pps> PPS`
-  - logger-prefixed recall lines such as:
-    - `2026-... INFO ... recall@5: 0.9043`
-- `results.json` keeps the legacy page fields for backward compatibility:
-  - `metrics.pages`
-  - `metrics.pages_per_sec_ingest`
-- For current upstream batch runs, those legacy page fields may be `null`.
-- The authoritative ingest counters for the current upstream path are:
-  - `metrics.rows_processed`
-  - `metrics.rows_per_sec_ingest`
-- `metrics.ingest_secs` is still populated from whichever upstream ingest summary
-  line is available.
+## Structured Metrics Contract (Mar 2026)
+
+The harness no longer relies on stdout or stderr to derive ingest or evaluation
+metrics. Instead, each supported run mode produces a shared structured
+`RunReport`, and the harness projects that report into `results.json` and
+`session_summary.json`.
+
+### Authoritative metrics sources
+
+- `results.json["run_report"]` is the canonical per-run payload.
+- `results.json["metrics"]` is a flattened compatibility view derived from the report.
+- `results.json["summary_metrics"]` is the compact downstream view used by nightly/reporting.
+- `runtime_metrics/<run>.run_report.json` mirrors the same report for direct inspection.
+
+### Legacy compatibility
+
+- `results.json` still keeps compatibility fields such as `metrics.pages`,
+  `metrics.pages_per_sec_ingest`, `summary_metrics.recall_5`, and
+  `summary_metrics.ndcg_10`.
+- `command.txt` is retained for reproducibility, but it is no longer a metrics source.
 
 ### Remaining caveat
 
-- This compatibility follow-up restores harness operability and recall gating.
-- It does **not** solve the larger semantic question of authoritative physical PDF
-  page counts versus uploaded unique pages versus post-explode rows.
-- `runtime_summary` and `detection_summary` remain best-effort side artifacts and
-  may still be `null` until upstream batch mode writes them consistently again.
+- Metric semantics still need care when comparing `input_pages`,
+  `processed_pages`, and `rows_processed` across modes.
+- The harness now preserves all three counters explicitly rather than inferring
+  one from CLI text.
 
 ## Recommended Next Iterations
 

@@ -3,6 +3,7 @@
 active:
   dataset: jp20
   preset: single_gpu
+  run_mode: batch
   query_csv: data/jp20_query_gt.csv
   input_type: pdf
   recall_required: true

@@ -2,12 +2,10 @@
 # All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
 
+from __future__ import annotations
+
 from .executor import run_mode_ingest
 from .factory import RunMode, create_runmode_ingestor
-from .run_batch import run_batch
-from .run_fused import run_fused
-from .run_inprocess import run_inprocess
-from .run_online import run_online
 
 __all__ = [
     "RunMode",
@@ -18,3 +16,23 @@
     "run_inprocess",
     "run_online",
 ]
+
+
+def __getattr__(name: str):
+    if name == "run_batch":
+        from .run_batch import run_batch
+
+        return run_batch
+    if name == "run_fused":
+        from .run_fused import run_fused
+
+        return run_fused
+    if name == "run_inprocess":
+        from .run_inprocess import run_inprocess
+
+        return run_inprocess
+    if name == "run_online":
+        from .run_online import run_online
+
+        return run_online
+    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")