Back to Docs Index
- you need the full end-to-end architecture
- you are mapping API behavior to backend internals
- you are debugging planning, runtime, registry, or persistence paths
This document explains the system in depth:
- what the coordinator is actually doing
- how distributed quantum services are represented
- how a circuit becomes a routed execution plan
- how libp2p discovery and remote invocation fit into the workflow
- how runtime execution, persistence, and Qiskit analysis connect
If the README.md is the front door, this file is the machine room.
The project treats quantum capabilities as network services. Each service node advertises what it can do, the coordinator builds a view of the available topology, and client-submitted circuits are compiled into a distributed execution plan.
The core research idea is simple:
- represent quantum operations as distributed services
- coordinate them over
py-libp2p - preserve execution metadata and quality information
- compare distributed orchestration against simpler centralized control
That makes this system more than a parser plus simulator. It is a coordination layer for distributed quantum services.
The implementation lives in backend/src/quantum_backend_v2/ — a FastAPI application with a real py-libp2p runtime (Trio), Qiskit-backed quantum analysis, SQLAlchemy on Postgres, and Beanie on MongoDB.
The system is easiest to understand as three connected planes.
Responsible for deciding what should happen.
- API ingestion
- circuit normalization
- service discovery
- service registry freshness
- planning and assignment
- reservation decisions
- job lifecycle management
Responsible for making the distributed work actually happen.
- libp2p pubsub for service advertisements
- libp2p request streams for remote gate execution
- runtime scheduling
- timeout, retry, and fallback logic
Responsible for making the system inspectable and recoverable.
- Postgres (SQLAlchemy): users, enrollments, workflow runs, execution plans, financial jobs, reservation events, execution events (all append-only)
- MongoDB (Beanie): peer capabilities, topology projections, benchmark results, provenance bundles
- Local JSONL: append-only peer log with fsync (protocol events, reservation/execution transitions)
- Structured response models
- Qiskit result generation
flowchart TB
subgraph Clients
USER[Researcher or API Client]
POSTMAN[Postman or curl]
end
subgraph API Layer
API[FastAPI]
WS[WebSocket Job Updates]
end
subgraph Coordinator Control Plane
JM[Job Manager]
PARSER[Circuit Normalizer]
DAG[Dependency DAG Builder]
PLAN[Distributed Planner]
REG[Service Registry]
DISC[Discovery Ingest Loop]
RSV[Reservation Protocol]
end
subgraph Coordinator Execution Plane
RUN[Runtime Executor]
FAB[PyLibp2p Fabric]
INV[Gate Invocation Adapter]
end
subgraph Service Network
S1[Service Node A]
S2[Service Node B]
S3[Service Node C]
end
subgraph Data Plane
JOBDB[(Job Store)]
RTDB[(Runtime Event Store)]
REGDB[(Registry Snapshot Store)]
QISKIT[Qiskit Result Builder]
end
USER --> API
POSTMAN --> API
API --> JM
API --> WS
JM --> PARSER
PARSER --> DAG
DAG --> PLAN
PLAN --> REG
JM --> RSV
JM --> RUN
RSV --> REG
RUN --> INV
INV --> FAB
FAB --> S1
FAB --> S2
FAB --> S3
S1 --> DISC
S2 --> DISC
S3 --> DISC
DISC --> REG
JM --> JOBDB
RUN --> RTDB
REG --> REGDB
RUN --> QISKIT
QISKIT --> JOBDB
JOBDB --> API
The best mental model is:
- a client submits a circuit
- the coordinator turns it into a dependency-aware plan
- each fragment is mapped to a service node
- the runtime reserves and executes fragments in order
- results and metadata are persisted
- Qiskit reconstructs the pre-measurement quantum state and analysis payload
- the API returns both execution metadata and quantum insight
sequenceDiagram
participant C as Client
participant A as FastAPI
participant J as Job Manager
participant P as Planner
participant R as Service Registry
participant X as Runtime Executor
participant F as PyLibp2p Fabric
participant S as Service Node
participant Q as Qiskit Result Builder
participant D as SQLite
C->>A: POST /api/v1/circuits/submit
A->>J: submit(job)
J->>D: persist QUEUED job
A-->>C: job_id
A->>J: background process(job_id)
J->>P: compile(circuit)
P->>R: query available services
R-->>P: fresh capability snapshot
P-->>J: execution plan
J->>D: persist plan_id and EXECUTING state
loop per ready fragment
X->>R: reserve candidate node
R-->>X: accepted or rejected
X->>F: invoke remote fragment
F->>S: libp2p stream request
S-->>F: success or error + observed fidelity
F-->>X: invocation response
X->>D: persist fragment execution event
end
X->>Q: build quantum_result(plan, fragment_results)
Q-->>X: counts, probabilities, statevector, metrics
X->>D: persist COMPLETED result
C->>A: GET /api/v1/jobs/{job_id}
A->>D: load latest job record
A-->>C: result payload
The API layer is the user-facing boundary.
Primary responsibilities:
- validate incoming request payloads
- enforce optional API key authentication
- enforce optional in-memory rate limiting
- enqueue background processing
- expose current job status and final results
- expose service and fidelity snapshots
- expose a WebSocket stream for job lifecycle changes
The API is intentionally thin. It does not contain planning or runtime logic. It delegates that work to JobManager and related application/runtime services.
System: GET /, GET /api/v1/health, GET /api/v1/ready
Bootstrap: GET /api/v1/bootstrap/libp2p, GET /api/v1/bootstrap/libp2p/runtime
Discovery: GET /api/v1/discovery/peers, GET /api/v1/discovery/peers/{peer_id}, GET /api/v1/discovery/topology, GET /api/v1/discovery/network/topology
Enrollment: POST /api/v1/enrollment/peers, GET /api/v1/enrollment/peers, GET /api/v1/enrollment/peers/{peer_id}, POST /api/v1/enrollment/peers/{peer_id}/action
Circuits: POST /api/v1/circuits/submit
Jobs: GET /api/v1/jobs, GET /api/v1/jobs/{job_id}
Plans: GET /api/v1/plans/{plan_id}
Services: GET /api/v1/services
Metrics: GET /api/v1/metrics/fidelity/{node_id}
Finance: POST /api/v1/finance/submit, GET /api/v1/finance/{job_id}, GET /api/v1/finance/{job_id}/comparison, GET /api/v1/finance
Workflows: POST /api/v1/workflows/runs, GET /api/v1/workflows/runs/{run_id}, POST /api/v1/workflows/benchmarks, GET /api/v1/workflows/benchmarks/{benchmark_id}
Reservations: POST /api/v1/reservations, GET /api/v1/reservations/{reservation_id}, POST /api/v1/reservations/{reservation_id}/cancel
The JobManager is the top-level orchestration entry point inside the coordinator.
It owns the coarse lifecycle:
QUEUED -> COMPILING -> RESERVING -> EXECUTING -> COMPLETED | FAILED
Responsibilities:
- persist jobs immediately on submit
- claim in-flight jobs to avoid duplicate processing
- compile the circuit into an execution plan
- persist lifecycle transitions
- execute the plan
- serialize final runtime results into stable JSON
- recover unfinished jobs at startup
It is effectively the application service boundary between the API and the distributed execution engine.
Service nodes periodically advertise capabilities over libp2p pubsub. The coordinator subscribes to the topic, validates each advertisement, and stores it in a local registry.
Each advertisement includes:
node_idlisten_addrsservice_typefidelityqubit_minqubit_maxavailabilityupdated_at
The registry is the coordinator's local truth for planning decisions. It is:
- freshness-aware
- queryable by service type and availability
- persisted to SQLite for restart resilience
This is important because the planner should not depend on an external network call every time it evaluates a candidate. Instead, it plans against a stable, local snapshot of the distributed service landscape.
The planner does not work directly on raw OpenQASM text. The first step is normalization into a small internal representation.
Current supported input styles:
- OpenQASM 2 style
qreg - OpenQASM 3 style
qubit[] - project-specific service aliases such as
bell_pair,teleport, andmeasure
Normalization extracts:
- circuit format
- qubit count
- ordered operations
- service type per operation
- qubit operands
This stage is deliberately conservative. If the parser cannot understand an operation or declaration, it fails fast with a structured error rather than guessing.
After normalization, operations are transformed into a dependency-aware representation.
The essential rule is:
- operations that touch the same qubit are ordered by dependency
That yields a DAG of execution constraints. The planner then groups operations into fragments, where each fragment carries:
fragment_idservice_typequbitsoperation_ids- dependency list
This matters because the runtime does not execute "the whole circuit" at once. It executes ready fragments whose dependencies have already completed.
The distributed planner consumes:
- normalized circuit fragments
- current registry snapshot
- planner cost configuration
It produces an ExecutionPlan containing:
plan_id- fragment ordering
- fragment definitions
- candidate service-node assignments
- primary and fallback nodes
- quality snapshot metadata
The planner is cost-based, not random. It scores candidates using components such as:
- latency cost
- failure-risk cost
- entanglement cost
- load cost
The outcome is deterministic for a fixed topology and configuration, which is important for repeatability in a research setting.
Before runtime invocation, the coordinator asks whether a target node is acceptable for the fragment and fidelity threshold.
The reservation layer gives the runtime a place to encode:
- requested service type
- target node
- minimum fidelity
- reservation acceptance or rejection
- cancellation and execution state
Even in this proof of concept, this is an important architectural separation:
- planning answers "where should this go?"
- reservation answers "can I still use this candidate right now?"
That split makes the system more realistic than directly invoking the first matching node.
stateDiagram-v2
[*] --> REQUESTED
REQUESTED --> PREPARED
PREPARED --> COMMITTED
PREPARED --> REJECTED
COMMITTED --> EXECUTED
COMMITTED --> EXPIRED
COMMITTED --> CANCELED
The runtime executor is where the distributed plan becomes actual work.
Responsibilities:
- walk the fragment dependency order
- identify fragments whose dependencies are already satisfied
- reserve a candidate node
- invoke remote execution over libp2p
- retry on transient failure
- switch to fallback nodes when needed
- record detailed runtime events
The runtime does not simply stop on the first issue. It has explicit behavior for:
- timeout
- execution rejection
- connection drop
- fidelity below threshold
This is the main reliability layer of the coordinator.
The libp2p fabric is the transport substrate for the demo and integration environment.
It manages:
- one coordinator node
- a configurable number of embedded service nodes
- pubsub subscription for service advertisements
- request/response stream handlers for gate execution
- Trio-native libp2p service lifecycles
Without this layer, the project would be a local simulation with some network-shaped abstractions. With it, the coordinator is actually:
- booting libp2p hosts
- exchanging advertisements over pubsub
- dialing peers over multiaddrs
- sending real request payloads over stream protocols
That is the heart of the distributed quantum services story.
The runtime result is not limited to "job succeeded". After fragment execution completes, the plan is translated into a Qiskit circuit and analyzed.
The result payload can include:
countsprobabilitiesmeasured_probabilitiesstatevectormeasured_qubitsobservable_expectationsreduced_density_matricesbloch_vectorsentanglement_entropyfidelitytop_basis_states
The distributed runtime produces execution metadata:
- which node ran what
- how many attempts happened
- what fidelity was observed
The Qiskit layer produces quantum analysis:
- what state was implied by the executed plan
- what measurement marginal was sampled
- what observables and subsystem states look like
Together, those two perspectives make the result useful to both systems engineers and quantum researchers.
This phrase has a very specific meaning in this repository.
It does not mean:
- a complete quantum internet stack
- a hardware-level entanglement routing layer
- a production cloud orchestration plane for real QPUs
It does mean:
- quantum capabilities are exposed as service advertisements
- each capability is attached to a node identity and reachable address
- execution is planned against a distributed service registry
- invocation happens over a peer-to-peer transport
- the coordinator can route different fragments to different service nodes
That is the core architectural claim of the project.
Consider this input:
OPENQASM 3;
qubit[2] q;
bit[1] c;
bell_pair q[0], q[1];
cnot q[0], q[1];
cz q[0], q[1];
teleport q[0], q[1];
syndrome_extraction q[0];
distillation q[1];
measure q[0] -> c[0];The API accepts the payload, validates size limits, persists a QUEUED job, and returns job_id.
The parser maps operations into supported service types:
bell_paircnotczteleportationsyndrome_extractiondistillationmeasurement_feedforward
The planner constructs fragments, evaluates registry candidates, and assigns primary and fallback service nodes.
The runtime walks ready fragments in dependency order and invokes them over libp2p request streams.
Each fragment outcome is written to SQLite with:
- fragment ID
- node ID
- start and finish timestamps
- attempt count
- observed fidelity
- error, if any
The executed plan is translated into a Qiskit circuit and analyzed.
For the circuit above, the current model can return data such as:
- measured counts for
q[0] - full pre-measurement basis probabilities
- subsystem density matrices
- Bloch vectors per qubit
- expectation values for
Z,ZZ, andXX - top basis states by probability
stateDiagram-v2
[*] --> QUEUED
QUEUED --> COMPILING
COMPILING --> RESERVING
RESERVING --> EXECUTING
EXECUTING --> COMPLETED
EXECUTING --> FAILED
COMPILING --> FAILED
RESERVING --> FAILED
QUEUED: job record created, execution has not startedCOMPILING: circuit is being normalized and planned against the service registryEXECUTING: fragment-level work is actively being invoked over libp2p streamsCOMPLETED: result JSON is persisted and fetchableFAILED: planning or execution terminated with a permanent error
The system persists more than final outcomes.
Tables: users, enrollments, workflow_runs, execution_plans, financial_jobs, reservation_events (append-only), execution_events (append-only)
Collections: peer capabilities, topology projections, benchmark results, provenance bundles
Protocol events, reservation/execution transitions, package installs, sync checkpoints — fsync'd on write.
This model enables:
- crash recovery at startup (reload unfinished jobs from Postgres)
- post-mortem analysis on any job
- live topology inspection from MongoDB projections
- future experiment reporting from benchmark collections
At startup, the coordinator:
- runs SQLite migrations
- restores persisted registry state
- starts the libp2p fabric
- starts the discovery ingest loop
- reloads unfinished jobs
- reprocesses them through the job manager
This matters because jobs are not just in memory. If the process restarts mid-run, the system can continue from durable state.
The result payload is intentionally split between execution truth and quantum interpretation.
Found in fragment_results:
- which node executed the fragment
- how many attempts were required
- when each fragment started and finished
- what observed fidelity came back from the service node
Found in quantum_result:
- sampled counts over measured qubits
- full basis probabilities
- statevector
- observables
- subsystem state summaries
- entropy and fidelity estimates
This is important because a distributed orchestration demo should answer both:
- "did the networked execution succeed?"
- "what quantum state and measurements does that imply?"
This proof of concept makes several explicit simplifications.
teleportation is currently modeled as an ancilla-free logical SWAP during Qiskit state evolution.
Why:
- the current DSL does not encode the full teleportation protocol
- there is no explicit ancilla allocation in the input language
- there is no classical correction path represented in the current circuit model
These are treated as orchestration-level steps rather than additional unitary evolution.
Why:
- the DSL does not yet encode ancillary qubits
- no stabilizer measurement model exists in the current parser
- no detailed classical feedback loop is represented
The current fidelity_to_target_state is computed against the ideal compiled state produced by the same translated circuit.
Interpretation:
- it is useful as a consistency reference
- it is not yet a real hardware-vs-target fidelity measurement
This is currently derived from fragment-level observed fidelities.
Interpretation:
- it gives a practical runtime quality estimate
- it is not a substitute for hardware tomography
The coordinator is designed to fail usefully, not silently.
- if real libp2p startup fails while enabled, API startup fails
- timeouts are bounded
- retries use bounded exponential backoff
- fallback nodes are attempted when available
- degraded fidelity can terminate execution before a bad result is accepted
- malformed or unsupported circuits fail with structured errors
A demo of distributed quantum services is only credible if failure modes are explicit. Silent local fallback or hidden transport degradation would make the system look better than it really is. This project avoids that when libp2p is enabled.
The current implementation already demonstrates:
- distributed service discovery
- distributed planning
- real libp2p transport
- fragment-level orchestration
- persisted runtime telemetry
- Qiskit-backed result interpretation
The next architectural step is not "more endpoints". It is comparative evaluation:
- centralized baseline mode
- repeatable scenario matrix
- controlled latency and degradation injection
- artifact export for paper or proposal-quality reporting
That is how the project moves from "strong systems demo" to "publishable evaluation."
This system is a distributed quantum services coordinator.
It accepts circuits, discovers remote capabilities, plans against a live network view, reserves distributed resources, executes fragments over libp2p, persists the full lifecycle, and returns both orchestration metadata and quantum analysis.
That combination is the architecture.