feat(security): Noise XXhfs post-quantum handshake for py-libp2p (research/WIP)#1310
feat(security): Noise XXhfs post-quantum handshake for py-libp2p (research/WIP)#1310paschal533 wants to merge 20 commits into
Conversation
Implements Noise_XXhfs_25519+XWing_ChaChaPoly_SHA256 as a new optional security transport under protocol ID /noise-pq/1.0.0. New modules under libp2p/security/noise/pq/: - kem.py: X-Wing hybrid KEM (ML-KEM-768 + X25519), IKem protocol - noise_state.py: SymmetricState and CipherState for XXhfs - patterns_pq.py: PatternXXhfs three-message handshake state machine - transport_pq.py: TransportPQ implementing ISecureTransport Test coverage (92 tests, all passing): - test_kem.py: X-Wing keygen, encapsulate, decapsulate round-trips - test_noise_state.py: SymmetricState primitives and HKDF split - test_patterns_pq.py: full in-memory handshake, peer ID verification - test_transport_pq.py: TransportPQ integration with SecureSession - test_vectors_pq.py: 47 cross-implementation vector tests against 5 deterministic vectors from js-libp2p-noise; all pass byte-for-byte scripts/interop_dial.py: live TCP dialer connecting to the JS node-listener, completes a real handshake and exchanges encrypted messages (Python initiator, JS responder). benchmarks/bench_noise_pq.py + results.md: handshake latency, KEM micro-benchmarks, and wire size comparison vs classical Noise XX. The existing /noise transport and PatternXX are untouched.
Fix remaining 8 ruff violations that ruff --fix could not auto-correct: - kem.py: add r-prefix to _xwing_combine docstring (D301) to satisfy ruff's requirement that docstrings with backslash escapes use raw strings - bench_noise_pq.py: wrap five long f-string lines to stay within the 88-char limit (E501); extract overhead_x local to avoid repetition - test_patterns_pq.py: wrap two long inline comments (E501) All 46 ruff errors are now resolved.
|
Just ran the live interop test end-to-end... wanted to confirm it actually works before people try to reproduce it. Setup:
Output: Two completely separate runtimes, one real TCP socket, same handshake keys on both ends. The cross-language test vectors in The JS listener script ( Node.js v22, Python 3.13, Windows 11, both sides happy. |
Performance update: WASM KEM results from the JS side and what they mean hereI ran more detailed performance work on the JS implementation this week and wanted to share the findings here since they are relevant to both PRs. What I measured on the JS sideBuilt a Rust WASM module for X-Wing (58 KB binary, KEM micro-benchmarks (Node.js v22.17.1):
The WASM KEM is 3.2x faster on the KEM operations. But the full handshake barely moves:
Less than 2% improvement on the full handshake even though the KEM is 3x faster. This is Amdahl's Law. The KEM accounts for maybe 25-30% of total handshake time. The rest is SHA-256, ChaCha20-Poly1305, HKDF, X25519, Ed25519, Protobuf, and async scheduling overhead, all of which are still running in interpreted code regardless of what the KEM is doing. What this means for the Python benchmarksThe Python numbers in this PR (10x overhead vs classical, ~40 ms for XXhfs) are in a similar position. Swapping From a rough breakdown: the KEM round-trip in Python is about 38 ms (10.5 + 12.3 + 15.5 ms). The full handshake is 40.7 ms, which means the non-KEM overhead is only about 2-3 ms. So actually the Python situation is the inverse of JS - the KEM dominates much more strongly here (around 90% of handshake time vs maybe 30% in JS). That means a fast KEM backend like liboqs would make a much bigger difference in Python than it does in JS. If liboqs-python is available, switching to it for the KEM path should bring the XXhfs handshake close to the classical baseline. The Summary
Happy to share the full benchmark script if useful. |
Introduces LibOQSXWingKem (oqs.KeyEncapsulation("ML-KEM-768") + PyNaCl
X25519) and make_fast_kem() which auto-selects the liboqs backend when
available, falling back to the pure-Python XWingKem silently.
- transport_pq.py and patterns_pq.py now use make_fast_kem() as the
default KEM instead of hardcoding XWingKem()
- Added pq-fast optional dependency group to pyproject.toml so users
can install liboqs support with: pip install libp2p[pq-fast]
- KeypairPool pre-generates keypairs in background threads to amortize
liboqs keygen latency under concurrent load
- bench_noise_pq.py updated to benchmark both backends and report the
speedup ratio
Predicted improvement: ~92% reduction in XXhfs handshake latency
(40.7 ms kyber-py to ~3.2 ms liboqs), bringing it below the classical
Noise baseline of 4.0 ms.
On systems where liboqs C library is absent, liboqs-python's auto-install attempts git clone and waits 5 seconds per call. Without caching, every make_fast_kem() call in the benchmark (50 handshake + 200 throughput iterations) triggered the wait. - Add module-level _LIBOQS_AVAILABLE flag in kem_backends.py; set once on first LibOQSXWingKem() construction, fast-path thereafter - Broaden except clauses to catch OSError (Windows temp-dir cleanup race that fires when oqs auto-install fails) - Update bench_noise_pq.py with the same OSError coverage - Commit measured benchmark results to benchmarks/results.md
|
Posting actual benchmark numbers from the bench_noise_pq.py suite (Python 3.13.1, Windows 11 Pro x64, in-memory connections, kyber-py baseline). KEM micro-benchmarks
Handshake latency (round-trip, in-memory)
Transport throughput (post-handshake)
A few notes on these numbers: The KEM round-trip accounts for 63% of total XXhfs handshake time (27.15 ms out of 42.96 ms). The remaining 15.81 ms is non-KEM overhead, which is notably higher than the 3.32 ms classical baseline. The extra cost comes from ChaCha20-Poly1305 encryption of the 1,120-byte KEM ciphertext, HKDF key derivation, and Ed25519 signing, all running in CPython without native acceleration. Using Amdahl's Law with f = 0.632 and a ~50x speedup from liboqs: So the liboqs backend is predicted to bring XXhfs down to roughly 16 ms, about 5x over the classical baseline rather than matching it. Still a substantial win (12.9x to 5x), and transport throughput is already on par with classical since both paths use the same ChaCha20-Poly1305 cipher state after the handshake. The |
- Fix asyncio.Task -> asyncio.Task[None] in kem_backends.py (mypy type-arg) - Break overlong assert line in test_noise_state.py (E501) - Apply ruff-format to test_vectors_pq, test_transport_pq, test_kem, test_noise_state, and scripts/interop_dial to match CI formatter version
AI PR Review — #1310PR: feat(security): Noise XXhfs post-quantum handshake for py-libp2p (research/WIP) 1. Summary of ChangesThis PR adds exploratory support for a post-quantum Noise handshake under protocol ID New modules (all under
Supporting additions:
Related context (not issues):
Breaking changes: None. This is additive and opt-in via a new protocol ID. Author intent: Explicitly marked as draft/research — "Nothing here is intended for merge right now." 2. Branch Sync Status and Merge ConflictsBranch Sync Status
Merge Conflict Analysis✅ No merge conflicts detected. The PR branch merges cleanly into 3. Strengths
4. Issues FoundCritical
dependencies = [
...
"pynacl>=1.3.0",
# kyber-py is MISSING — required by libp2p/security/noise/pq/kem.py
...
]
[project.optional-dependencies]
pq-fast = [
"liboqs-python>=0.12.0",
"PyNaCl>=1.5.0", # already a core dep
]
_VECTORS_PATH = (
Path(__file__).parents[4].parent # PQC-Research/
/ "js-libp2p-noise"
/ "test"
/ "fixtures"
/ "pqc-test-vectors.json"
)
Major
from .transport_pq import PROTOCOL_ID, TransportPQ
from .kem_backends import KeypairPool, LibOQSXWingKem, make_fast_kem
Minor
5. Security ReviewOverall: The cryptographic design appears sound for a research implementation. No critical vulnerabilities identified in the handshake logic itself.
Items to monitor:
Security Impact: Low (for draft/research scope) 6. Documentation and Examples
Recommendation: For eventual merge, add a short guide under 7. Newsfragment Requirement
8. Tests and ValidationLinting (
|
| Check | Result |
|---|---|
| yaml, toml, whitespace, pyupgrade | ✅ Passed |
| ruff + ruff format | ✅ Passed |
| mdformat | ✅ Passed |
| mypy | ✅ Passed |
| pyrefly | ❌ Failed (exit code 1) |
| Cross-platform path audit | ✅ Passed |
pyrefly errors (33 shown):
- 3×
[import-error]—kyber_py.ml_kemnot found (2 files),oqsnot found (1 file) - 1×
[bad-argument-type]—kem_backends.py:290run_in_executorbound method - 3×
[implicitly-defined-attribute]—test_kem.pysetup_methodattributes - 26×
[bad-argument-type]— test mock connections not typed asIRawConnection
Overall lint: ❌ Failed due to pyrefly
Type Checking (make typecheck)
- mypy: ✅ Passed
- pyrefly: ❌ Failed (same 33 errors as above)
Test Execution (make test)
| Metric | Value |
|---|---|
| Passed | 2795 |
| Skipped | 16 |
| Errors | 5 (all PQ test collection — ModuleNotFoundError: kyber_py) |
| Failed | 0 |
| Duration | ~106 s |
PQ tests (with kyber-py manually installed):
| Metric | Value |
|---|---|
| Passed | 45 |
| Skipped | 47 (vector tests — fixture file not present) |
| Failed | 0 |
| Duration | ~0.5 s |
Key observation: The 47 skipped vector tests are the PR's primary interop proof. They do not run in default CI or clean checkouts.
Documentation Build (make linux-docs)
❌ Failed (warnings treated as errors)
Errors:
patterns_pq.pymodule docstring:Unexpected indentation(lines 10–11)patterns_pq.pyhandshake_outbounddocstring:Unexpected indentation(line 6)libp2p.security.noise.pq.rst:document isn't included in any toctree
CI Status (GitHub Actions)
| Job | Result |
|---|---|
| tox core (3.10–3.13) | ✅ Pass |
| tox interop, demos, utils, wheel | ✅ Pass |
| tox lint (3.10–3.13) | ❌ Fail |
| tox docs (3.10) | ❌ Fail |
| windows core (3.11) | ❌ Fail |
| Read the Docs | ✅ Pass |
9. Recommendations for Improvement
- Declare
kyber-pyinpyproject.toml(test group at minimum; considerpqoptional extra for runtime). - Vendor test vectors into the repo and fix
_VECTORS_PATH— this is the highest-value test asset. - Add PQ tests to CI via a dedicated tox env with
kyber-pyinstalled. - Fix Sphinx docstrings in
patterns_pq.pyto unblock docs build. - Lazy-load heavy imports in
pq/__init__.pyto decouplenoise_statefrom KEM dependencies. - Open tracking issue when spec discussion allows (per @acul71 guidance); link in PR and add newsfragment.
- Wire or document
KeypairPool— either integrate into handshake or mark as experimental. - Deduplicate
_xwing_combineinto a shared module. - Add pyrefly stubs for
kyber_pyandoqs(or# pyrefly: ignorewith justification) to unblock lint CI. - Add user documentation for
/noise-pq/1.0.0setup before marking PR ready for review.
10. Questions for the Author
- Was the decision to use
mix_key()rather thanmix_key_and_hash()for theekem1token verified against the latest Noise HFS spec draft and js-libp2p-noise#665? The unusedmix_key_and_hash()method suggests possible spec ambiguity. - Can the cross-implementation vector file be committed to this repo (or fetched as a test fixture submodule) so CI can run the 47 vector assertions?
- Is
KeypairPoolintended to be integrated intoPatternXXhfsbefore merge, or kept as optional infrastructure for callers to wire manually? - Should
kyber-pybe a hard dependency, or only pulled in via an optionalpqextra to keep the default install lean? - What is the plan for protocol ID alignment if noise-pq: add Noise_XXhfs_25519+XWing_ChaChaPoly_SHA256 spec (Stage 1 Working Draft) specs#716 settles on a different string than
/noise-pq/1.0.0? - Has Go interop been attempted or planned, as mentioned in the PR body's merge criteria?
11. Overall Assessment
| Criterion | Rating |
|---|---|
| Quality Rating | Good (for research/exploratory draft) |
| Security Impact | Low |
| Merge Readiness | Not ready (by author intent + project blockers) |
| Confidence | High |
Summary: This is high-quality exploratory work that makes a concrete, reviewable contribution to the libp2p PQC handshake discussion. The modular design, pluggable KEM backends, layered tests, and live JS interop are strong foundations. The author correctly labels it as draft/WIP, and maintainer guidance to wait for spec stabilization is appropriate.
For merge readiness, the blockers are primarily process and infrastructure rather than cryptographic correctness: missing issue/newsfragment (expected for now), undeclared kyber-py dependency, non-vendored test vectors silently skipped in CI, PQ tests excluded from tox, and docs/lint CI failures. None of these diminish the research value of the PR, but all must be addressed before it can graduate from draft to mergeable.
Recommended next step: Continue using this PR as a feedback vehicle. Address dependency/CI/vector vendoring when the author is ready to move toward merge, coordinated with libp2p/specs#716 and a tracking issue.
Copies pqc-test-vectors.json from js-libp2p-noise into tests/fixtures/ so the 47 vector assertions run in CI on every clean checkout instead of silently skipping due to a missing sibling-repo path. Updates _VECTORS_PATH in test_vectors_pq.py from the external PQC-Research/js-libp2p-noise path to the repo-relative tests/fixtures/pqc-test-vectors.json. Addresses review finding #3 (critical) from PR libp2p#1310.
Adds kyber-py>=0.9.0 to: - [dependency-groups.test]: ensures tox -e pq and uv install always pulls it in, stopping the 5x ModuleNotFoundError: kyber_py in CI - [project.optional-dependencies] as a new pq extra so end-users can pip install libp2p[pq] for the pure-Python KEM backend Addresses review finding #1 (critical) from PR libp2p#1310.
Adds py{310,311,312,313}-pq to the tox envlist and a pq: command line
that runs tests/security/noise/pq with a 120s timeout.
Also adds pq to the Ubuntu CI matrix in tox.yml so the 92 PQ tests
(including the 47 cross-implementation vector assertions) run on every
pull request. Windows CI is intentionally excluded since liboqs is not
required and the pure-Python path is covered by Ubuntu.
Addresses review finding #4 (critical) from PR libp2p#1310.
- Convert indented pseudo-code blocks in patterns_pq module docstring to RST literal-block syntax (::) so Sphinx parses them correctly - Rewrite handshake_outbound docstring: keep Args/Raises entries single-line (napoleon not configured; multi-line continuations inside block-quote Args: sections trigger Unexpected indentation in docutils) - Move the remote_peer=None explanation into the body paragraph - Create docs/libp2p.security.noise.pq.rst with automodule entries for all five pq submodules - Add libp2p.security.noise.pq to the Subpackages toctree in libp2p.security.noise.rst (fixes document not in toctree warning) Sphinx dummy build: 0 warnings, 0 errors.
kem.py: remove module-level `from kyber_py.ml_kem import ML_KEM_768`. Add XWingKem.__init__ that defers the import to instantiation time and raises ImportError with a clear install hint if kyber-py is absent. Size constants (XWING_PK_SIZE etc.) and the IKem Protocol are all compile-time values that need no kyber-py, so they remain at module level. pq/__init__.py: replace eager imports with a PEP 562 module __getattr__ that defers transport_pq and kem_backends imports until a name is first accessed. globals() caching ensures the second access is free. Effect: `import libp2p.security.noise.pq` and `import libp2p.security.noise.pq.noise_state` now succeed without kyber-py installed. kyber-py is only required the moment XWingKem() is instantiated (i.e., when an actual PQ handshake is initiated). Tests: 92 passed, 0 failed.
_xwing_combine() and _XWING_LABEL were defined independently in both kem.py and kem_backends.py with identical logic. A divergence would silently break cross-backend interoperability, since both XWingKem and LibOQSXWingKem must produce the same shared secret for the same inputs. Extract to libp2p/security/noise/pq/_xwing.py (leading underscore = private to the pq package). Both modules now import from this single source of truth. Remove hashlib from kem.py and kem_backends.py since it was only needed for the combiner. Tests: 92 passed, 0 failed.
mix_key_and_hash (3-output HKDF) is defined for psk tokens per the Noise spec section 5.2, not for KEM tokens. The ekem1 token in the XXhfs pattern uses mix_key (2-output HKDF), identical to how DH tokens (ee, es, se) are processed. Added explanatory note to the mix_key_and_hash docstring and inline comments at both ekem1 call sites to prevent reviewer confusion.
- kem_backends.py: add # type: ignore[arg-type/union-attr] to oqs KeyEncapsulation context manager calls (oqs stubs TypeVar limitation); add # type: ignore[arg-type] to run_in_executor Protocol method call - test_kem.py: add kem: XWingKem class-level annotation to suppress implicitly-defined-attribute errors from setup_method assignment - test_patterns_pq.py / test_transport_pq.py: make _MemoryConn and _WriteCapture inherit from IRawConnection; add is_initiator class attr; fix get_remote_address / get_transport_addresses / get_connection_type return type annotations; import Multiaddr, IRawConnection, ConnectionType
…security warning to handshake_outbound - pyproject.toml: drop PyNaCl>=1.5.0 from [pq-fast] extra; PyNaCl is already a core dependency (pynacl>=1.3.0) so listing it again in the optional extra was redundant and misleading - patterns_pq.py: add Sphinx '.. warning::' directive to handshake_outbound docstring explaining that remote_peer=None disables peer-identity binding (signature is still verified; peer-ID check is not)
examples/pq_noise/pq_demo.py starts a listener and a dialer in the
same process, connects them over loopback TCP using TransportPQ as
the only security transport, and verifies a round-trip message after
the PQ Noise handshake completes.
Demonstrates: new_host(sec_opt={PROTOCOL_ID: TransportPQ(...)}) wiring,
multistream /noise-pq/1.0.0 negotiation, and application-layer data
flow through the X-Wing encrypted channel.
|
Thanks for the thorough review @acul71 this is exactly the kind of structured feedback that turns a draft into something mergeable. I've gone through every recommendation and question. Here's a full accounting of what's been addressed. Response to Section 9: Recommendations1. Declare 2. Vendor test vectors ✅ 3. Add PQ tests to CI ✅ 4. Fix Sphinx docstrings ✅ 5. Lazy-load in 6. Open tracking issue ⏳ Pending spec stabilisation (per @acul71's guidance, acknowledged). 7. Wire or document 8. Deduplicate 9. Add pyrefly stubs / suppress with justification ✅
10. Add user documentation ✅ Response to Section 10: QuestionsQ1. Yes, verified. Noise spec §5.2 specifies that Q2. Can the cross-implementation vector file be committed? Done see recommendation 2 above. The vectors are committed as Q3. Is No. it stays caller-wired. The handshake is correct and complete without it; Q4. Should Optional extra, but required for the test suite. The install footprint of Q5. Protocol ID alignment with libp2p/specs#716?
Q6. Go interop planned? Not yet attempted programmatically. The cross-language static test vectors (same vectors that now run in CI) cover byte-level correctness for the handshake transcript, key schedule, and cipher state, which is the meaningful interop claim. Dynamic Go interop would require a matching Additional: live runtime integration testBeyond unit tests, I ran a live in-process node test: two Output (kyber-py backend, Windows 11): The 7-second figure includes the Current CI status post-push: tox |
- ruff: fix import ordering (I001) and line length (E501) in pq_demo.py - docs: add examples.pq_noise to toctree so sphinx-apidoc output is linked - fixtures: add missing trailing newline to pqc-test-vectors.json
AI PR Review — #1310 (v1)PR: feat(security): Noise XXhfs post-quantum handshake for py-libp2p (research/WIP) 1. Summary of ChangesThis PR adds exploratory support for a post-quantum Noise handshake under protocol ID New modules (all under
Supporting additions:
Related context (not issues):
Breaking changes: None. Additive and opt-in via a new protocol ID. Author intent: Explicitly marked as draft/research — "Nothing here is intended for merge right now." Since review v0: The author addressed most infrastructure feedback from @acul71's posted review (dependencies, vectors, CI, docs, lazy imports, deduplication). 2. Branch Sync Status and Merge ConflictsBranch Sync Status
Merge Conflict Analysis✅ No merge conflicts detected. The PR branch merges cleanly into 3. Strengths
4. Issues FoundCritical
Major
def get_pattern(self) -> PatternXXhfs:
"""Return a fresh PatternXXhfs for a single handshake."""
return PatternXXhfs(
local_peer=self.local_peer,
libp2p_privkey=self.libp2p_privkey,
noise_static_key=self.noise_privkey,
kem=make_fast_kem(),
)
Minor
5. Security ReviewOverall: Cryptographic design appears sound for a research implementation. No new vulnerabilities identified beyond items already noted in v0.
Items to monitor:
Security Impact: Low (for draft/research scope) 6. Documentation and Examples
7. Newsfragment Requirement
8. Tests and ValidationValidation was run on branch Linting (
|
| Check | Result |
|---|---|
| yaml, toml, whitespace, pyupgrade | ✅ Passed |
| ruff + ruff format | ✅ Passed |
| mdformat | ✅ Passed |
| mypy | ✅ Passed |
| pyrefly | ✅ Passed (with liboqs-python installed) |
| Cross-platform path audit | ✅ Passed |
Overall lint: ✅ Passed
Type Checking (make typecheck)
- mypy: ✅ Passed
- pyrefly: ✅ Passed
Test Execution (make test)
| Metric | Value |
|---|---|
| Passed | 2887 |
| Skipped | 16 |
| Failed | 0 |
| Errored | 0 |
| Duration | ~116 s |
PQ tests are included in the full tests/ tree when kyber-py is installed (via dev test dependency group).
PQ Test Suite (pytest tests/security/noise/pq/)
| Metric | Value |
|---|---|
| Passed | 92 |
| Skipped | 0 |
| Failed | 0 |
| Duration | ~0.7 s |
All 47 cross-implementation vector assertions pass against vendored fixtures.
Documentation Build (make linux-docs)
✅ Passed locally — 106 source files, 0 Sphinx warnings/errors
CI Status (GitHub Actions, latest push)
| Job | Result |
|---|---|
| tox pq (3.10–3.13) | ✅ Pass |
| tox core, interop, demos, utils, wheel (3.10–3.13) | ✅ Pass |
| tox docs (3.10) | ✅ Pass |
| tox lint (3.10–3.13) | liboqs-python or stubs/oqs/ for pyrefly on minimal installs |
| windows core/demos/utils/wheel | ✅ Pass |
| Read the Docs | ❌ Fail (build 33042886 — may be unrelated to PQ changes; tox docs passed) |
9. Recommendations for Improvement
- Open tracking issue + newsfragment when @acul71 signals spec readiness (per maintainer guidance).
- Revisit default KEM selection — consider
XWingKem()as transport default to avoid liboqs probe latency in dev/demo paths. - Decouple vector tests from private
kem.pysymbols — import from_xwing.py/ public constants. - Add user-facing setup guide before marking PR ready — short section in docs covering
pip install libp2p[pq], host wiring, and interop pointers. - Optional: add
stubs/oqs/__init__.pyiso pyrefly passes on minimal dev installs withoutpq-fast.
Resolved since v0 (no further action)
- ✅
kyber-pydeclared inpqextra andtestdependency group - ✅ Test vectors vendored at
tests/fixtures/pqc-test-vectors.json - ✅
toxpqenv in CI - ✅ Sphinx docstring fixes
- ✅ PEP 562 lazy imports in
pq/__init__.py - ✅
_xwing_combinededuplicated to_xwing.py - ✅
mix_key()vsmix_key_and_hash()documented forekem1 - ✅ Test doubles implement
IRawConnection - ✅ Redundant PyNaCl removed from
pq-fast - ✅
remote_peer=Nonesecurity warning added
10. Questions for the Author
- Is the ~5 s liboqs probe on first
TransportPQhandshake acceptable for the default path, or shouldXWingKem()be the default with liboqs as explicit opt-in? - Has Read the Docs build 33042886 been investigated? tox
docspasses; the RTD failure may be environmental but should be confirmed before merge. - When noise-pq: add Noise_XXhfs_25519+XWing_ChaChaPoly_SHA256 spec (Stage 1 Working Draft) specs#716 settles on a protocol ID, will you coordinate the one-line
PROTOCOL_IDchange with js-libp2p-noise and go implementations?
11. Overall Assessment
| Criterion | Rating |
|---|---|
| Quality Rating | Good (improved since v0) |
| Security Impact | Low |
| Merge Readiness | Not ready (draft by author intent + process blockers) |
| Confidence | High |
Summary: Substantial progress since review v0. The author systematically addressed maintainer feedback on dependencies, vectors, CI coverage, documentation, and code structure. Cryptographic correctness is well evidenced by 92 passing tests including byte-level JS interop vectors and live TCP interop. Lint, typecheck, tests, and docs all pass with standard dev dependencies plus the optional pq-fast extra where needed for pyrefly. Process blockers (tracking issue, newsfragment) remain appropriately deferred per @acul71's guidance until the libp2p PQC spec discussion matures.
Recommended next step: Continue using this PR as a feedback vehicle until specs and a tracking issue are ready.
- docs/examples.rst: remove examples.pq_noise from toctree; the RST is generated by sphinx-apidoc at build time but never committed, so RTD's HTML builder can't find it and fails with fail_on_warning=true - kem_backends.py:93: add # type: ignore[import-error] on `import oqs`; pyrefly treats optional C-extension imports inside try/except as errors when liboqs-python is not installed in the lint venv
sphinx-apidoc generates docs/examples.pq_noise.rst for the new pq demo package, but the file is not committed. On local tox-docs runs Sphinx finds the generated RST as an orphan (not in any toctree) and warns, which fails fail_on_warning=true. Adding it to exclude_patterns silences the orphan warning for both local and CI builds.
|
Hi @acul71 Thanks for the thorough v1 review addressing the open items inline. CI is now fully green (as of commit 4a9460f): Answers to the questions: Q1 liboqs probe latency as default: Agreed. I'll change transport_pq.py to default to XWingKem() and document make_fast_kem() / pq-fast as an explicit opt-in for production. The 5-second probe on first call in the default path is bad ergonomics and the benchmark analysis in this thread already shows liboqs matters more on Python than JS anyway, that deserves to be a deliberate choice, not an automatic one. Q2 RTD build 33042886: Investigated and fixed. The failure was specific to this PR's changes, not an environmental fluke. Q3 Protocol ID coordination: Yes, the plan is to treat PROTOCOL_ID = "/noise-pq/1.0.0" as a single constant to sync across py-libp2p, js-libp2p-noise (#665), and any go implementation once libp2p/specs#716 stabilises on an identifier. The constant is isolated to one line in transport_pq.py so the change is a one-liner when specs are ready. On the remaining major items: Vector test decoupling (import _xwing_combine from _xwing directly, surface size constants publicly) and the user-facing setup guide are the next two I'll address on this branch. Newsfragment and tracking issue remain deferred per your earlier guidance. |
This is a draft PR sharing exploratory research on adding a post-quantum Noise handshake to py-libp2p. Nothing here is intended for merge right now. The goal is to give people something concrete to look at, poke at, and give feedback on while the broader libp2p PQC spec discussion plays out.
What this adds
A new optional security transport under the protocol ID
/noise-pq/1.0.0, implementingNoise_XXhfs_25519+XWing_ChaChaPoly_SHA256. The existing/noisetransport andPatternXXare completely untouched.New code lives entirely in
libp2p/security/noise/pq/:kem.py- X-Wing hybrid KEM (ML-KEM-768 + X25519), with a narrowIKeminterface so the backend can be swapped out. Currently backed bykyber-py(pure Python, no C dependency needed).noise_state.py- a minimalSymmetricStateandCipherStatefor the XXhfs transcript, implemented directly rather than throughnoiseprotocol(which does not support custom tokens likee1/ekem1).patterns_pq.py-PatternXXhfs, the three-message handshake state machine. Handles the full message A/B/C flow, libp2p identity payload signing and verification, and the KEM encapsulate/decapsulate steps in the right order.transport_pq.py-TransportPQ, wrappingPatternXXhfsas anISecureTransport.Why X-Wing
X-Wing is a hybrid KEM combining ML-KEM-768 (NIST-standardised post-quantum) and X25519 (classical). The session key is secure as long as either algorithm holds, which is the right posture during the transition period where quantum computers are not yet a practical threat but could be in the future. Store-now-decrypt-later is a real risk today, which is why the KEM upgrade is worth doing ahead of quantum computers actually existing.
Test coverage
92 tests, all passing:
test_kem.py- X-Wing keygen, encapsulate, decapsulate, round-trips, error casestest_noise_state.py- SymmetricState primitives, HKDF split correctnesstest_patterns_pq.py- full in-memory handshakes, peer ID verification, signature rejectiontest_transport_pq.py- TransportPQ through the ISecureTransport interfacetest_vectors_pq.py- 47 cross-implementation vector assertions against 5 deterministic test vectors generated by js-libp2p-noise. Every byte of message A, B, C, the final handshake hash, and both transport cipher keys are verified to match.The vector tests are the main proof of wire compatibility. You can run them with:
Live interop with js-libp2p-noise
scripts/interop_dial.pyconnects to a running JS listener over a real TCP socket, completes the full XXhfs handshake as initiator, exchanges encrypted messages, and prints confirmation. Both sides independently verified the same session keys.To test it yourself:
The JS listener script is in the js-libp2p-noise PR: ChainSafe/js-libp2p-noise#665
Benchmarks
Run on Python 3.13, Windows, in-memory connections:
Wire size comparison (fixed handshake bytes, not counting libp2p payload):
The overhead is dominated by the KEM public key (1184 B in message 1) and the KEM ciphertext (1088 B in message 2). For long-lived connections this is a one-time cost.
A note on the relationship to libp2p/specs#710 and #711
The ongoing spec discussion around libp2p/specs#710 is about post-quantum peer identity keys and how they get encoded in PeerIDs. That is a separate concern from what this PR implements.
This PR is about the Noise handshake forward secrecy layer. It uses X-Wing for the key exchange (protecting against store-now-decrypt-later), but keeps Ed25519 for peer identity signatures exactly as they are today. The XXhfs handshake is independent of whatever identity key format libp2p eventually standardises on. When ML-DSA identity eventually lands,
NoiseHFSwill absorb it transparently because the signature step is key-type aware and sits behind the existinglibp2p_privkey.sign()abstraction.So this work does not need to wait for libp2p/specs#710 to resolve. The two tracks solve different problems.
What would make this mergeable eventually
Noise_XXhfshandshake pattern (separate from the identity key spec debate)IKeminterface and whetherliboqs-pythonshould be added as an optional dependency for production use/noise-pq/1.0.0is the right protocol ID or whether it should track whatever the spec settles onHappy to iterate on any of this based on feedback here.
Related: