Back to Docs Index
- you need requirement IDs and acceptance criteria
- you are checking scope (in-scope vs out-of-scope)
- you are validating definition-of-done expectations
| Requirement | Status |
|---|---|
| FR-001 Service Advertisement | ✅ Implemented |
| FR-002 Service Discovery and Registry | ✅ Implemented |
| FR-003 Circuit Ingestion and Normalization | ✅ Implemented |
| FR-004 Distributed Compilation with Cost Model | ✅ Implemented |
| FR-005 Reservation Protocol | ✅ Implemented (event-sourced) |
| FR-006 Runtime Execution Orchestration | ✅ Implemented |
| FR-007 Timeout, Retry, and Adaptive Fallback | ✅ Implemented |
| FR-008 Fidelity and Link Monitoring | ✅ Implemented |
| FR-009 Job API and Lifecycle | ✅ Implemented (28+ endpoints) |
| FR-010 Persistence and Recovery | ✅ Implemented (Postgres + MongoDB + JSONL) |
| FR-011 Configuration Management | ✅ Implemented (QB2_* env vars → Pydantic) |
| FR-012 Observability | ✅ Implemented (structured logs, health endpoint) |
| FR-013 Experiment Harness and Baseline Comparison | ✅ Implemented (QAOA benchmark scripts) |
| FR-014 Security Baseline | ✅ Implemented (API key + role-based auth stubs) |
The original scope is substantially complete. Outstanding areas:
- FR-013: Distributed vs centralized comparison harness — partial (QAOA benchmarks exist, formal experiment harness pending)
- NFR-004: Test coverage target (80%) — 20 unit tests exist, coverage measurement not yet tracked
This project builds a Python proof-of-concept where py-libp2p is the coordination layer for distributed quantum operations. Quantum gate capabilities are exposed as remotely invocable services, while the coordinator handles discovery, planning, reservation, execution sequencing, retries, and fallback.
The core research goal is to evaluate tradeoffs between:
- distributed orchestration over libp2p, and
- a centralized orchestration baseline
for latency, success rate, and final circuit fidelity under degraded network and node conditions.
- Circuit ingestion from OpenQASM
$\frac{2}{3}$ (and optional Qiskit object input) - Service advertisement and discovery over libp2p
- Distributed circuit planning with an explicit cost model
- Reservation protocol for remote gate execution windows
- Runtime orchestration with timeout/retry/fallback
- Fidelity and link-quality monitoring
- SQLite-backed persistence for jobs, reservations, and metrics
- Experiment harness to compare distributed vs centralized coordinator
- Real quantum hardware control and hardware-specific drivers
- Cryptoeconomic incentives or token pricing
- Byzantine fault tolerance / consensus between multiple coordinators
- Production-grade multi-tenant authz model
- Full-blown quantum error correction stacks beyond mocked services
- Client: Submits circuits and queries job status/results.
- Coordinator Node: Compiles circuits, reserves resources, orchestrates execution.
- Service Node: Advertises capabilities and executes gate-service requests.
- Experiment Runner: Executes benchmark suites and records outcomes.
- Language/runtime: Python 3.11+
- Networking substrate:
py-libp2p - Quantum operations are simulated/mocked services in POC
- All critical state changes are persisted to SQLite
- System must operate under partial failures (node loss, timeout, fidelity drop)
Service nodes must advertise capabilities and health in a machine-validated schema.
Acceptance criteria:
- Advertisement includes
node_id,service_type,fidelity,qubit_range,availability,timestamp, and protocol version. - Invalid advertisements are rejected and logged with reason.
- Capability or fidelity changes are re-advertised within configured refresh interval.
Coordinator must discover and maintain a fresh registry of service nodes.
Acceptance criteria:
- Coordinator discovers services by gate type and minimum fidelity.
- Stale entries are marked unavailable after TTL expiration.
- Discovery queries support filtering by capability, fidelity threshold, and availability.
Coordinator must accept client circuits and normalize to an internal IR.
Acceptance criteria:
- OpenQASM 2 and OpenQASM 3 inputs are accepted.
- Invalid syntax returns a structured validation error.
- Internal IR preserves gate order, qubit mapping, and measurement instructions.
Compiler must map circuit fragments to service nodes using a defined cost function.
Acceptance criteria:
- Compilation partitions circuit into executable fragments and dependencies (DAG).
- Fragment assignment minimizes configurable cost over latency, fidelity risk, and entanglement overhead.
- If no feasible mapping exists, compiler returns actionable error identifying missing capability/constraint.
Coordinator and service nodes must support reservation of gate execution windows.
Acceptance criteria:
- Reservation request/response includes operation, time window, fidelity requirement, and correlation IDs.
- Conflicts return rejection plus optional next-available window.
- Expired or canceled reservations are released automatically.
Coordinator runtime must execute according to compiled dependency order.
Acceptance criteria:
- Runtime executes only when predecessor dependencies are satisfied.
- Runtime records per-fragment start/end/status and chosen service node.
- Final result aggregates execution metadata and measurement outputs.
Runtime must recover from transient failures when possible.
Acceptance criteria:
- Timeout policy supports per-operation class configuration.
- Failed invocations retry with bounded exponential backoff.
- On retry exhaustion, runtime attempts fallback to next feasible node plan.
- If no fallback exists, runtime fails fast with detailed fragment-level error.
System must track service quality and react to degradation.
Acceptance criteria:
- Fidelity/link-quality reports are ingested and persisted with timestamps.
- Registry marks degraded nodes/links below configured thresholds.
- Compiler and runtime consume latest quality snapshot before assignment/invocation.
System must expose job submission and lifecycle APIs.
Acceptance criteria:
POST /api/v1/circuits/submitreturnsjob_idimmediately.GET /api/v1/jobs/{job_id}reports state:QUEUED|COMPILING|RESERVING|EXECUTING|COMPLETED|FAILED.- Optional websocket stream provides near real-time job state updates.
Critical execution state must survive coordinator restarts.
Acceptance criteria:
- Jobs, reservations, and metrics are persisted to SQLite.
- On restart, coordinator reloads unfinished jobs and stale-aware service cache.
- DB migrations are versioned and reversible.
System behavior must be tunable per environment.
Acceptance criteria:
- Config file defines timeouts, retries, thresholds, and discovery intervals.
- Environment overrides are supported.
- Invalid configuration fails startup with precise error messages.
Coordinator must provide structured operational insight.
Acceptance criteria:
- Structured JSON logs include request/job correlation IDs.
- Metrics cover queue depth, latency, retry counts, failure reasons, and success rate.
- Health endpoint reflects coordinator and dependency status.
Project must include an experiment harness for publishable evaluation.
Acceptance criteria:
- Same benchmark circuits can run in distributed mode and centralized baseline mode.
- Runs output comparable metrics: compile time, execution latency, success rate, estimated final fidelity.
- Harness supports controlled fault injection (latency spike, node drop, fidelity degradation).
System must provide minimum protections suitable for POC deployment.
Acceptance criteria:
- API supports optional key-based auth.
- API supports configurable rate limiting.
- Inputs are size-limited and schema-validated before processing.
Given identical topology and inputs (including seeded randomness), planner output must be reproducible.
Under transient failures, runtime should recover automatically for at least one fallback attempt when feasible.
For benchmark circuits up to 50 operations and <= 10 service nodes:
- P95 compile latency <= 2s on developer hardware.
- P95 orchestration overhead per remote fragment <= 300ms (excluding simulated quantum operation duration).
- Unit + integration tests required for core modules.
- Property tests required for critical protocol/state invariants.
- Minimum 80% coverage on coordinator core packages.
Design decisions, protocol contracts, and runbooks must be documented and cross-referenced with requirement IDs.
The POC is complete when all are true:
- FR-001 through FR-014 are implemented and tested.
- NFR targets are measured and reported.
- Experiment harness produces a reproducible comparison report against centralized baseline.
- Documentation includes architecture, API contract, protocol contract, and execution walkthrough.