Skip to content

feat(security): Noise XXhfs post-quantum handshake for py-libp2p (research/WIP)#1310

Draft
paschal533 wants to merge 20 commits into
libp2p:mainfrom
paschal533:feat/pqc-noise-xxhfs
Draft

feat(security): Noise XXhfs post-quantum handshake for py-libp2p (research/WIP)#1310
paschal533 wants to merge 20 commits into
libp2p:mainfrom
paschal533:feat/pqc-noise-xxhfs

Conversation

@paschal533

@paschal533 paschal533 commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

This is a draft PR sharing exploratory research on adding a post-quantum Noise handshake to py-libp2p. Nothing here is intended for merge right now. The goal is to give people something concrete to look at, poke at, and give feedback on while the broader libp2p PQC spec discussion plays out.


What this adds

A new optional security transport under the protocol ID /noise-pq/1.0.0, implementing Noise_XXhfs_25519+XWing_ChaChaPoly_SHA256. The existing /noise transport and PatternXX are completely untouched.

New code lives entirely in libp2p/security/noise/pq/:

  • kem.py - X-Wing hybrid KEM (ML-KEM-768 + X25519), with a narrow IKem interface so the backend can be swapped out. Currently backed by kyber-py (pure Python, no C dependency needed).
  • noise_state.py - a minimal SymmetricState and CipherState for the XXhfs transcript, implemented directly rather than through noiseprotocol (which does not support custom tokens like e1/ekem1).
  • patterns_pq.py - PatternXXhfs, the three-message handshake state machine. Handles the full message A/B/C flow, libp2p identity payload signing and verification, and the KEM encapsulate/decapsulate steps in the right order.
  • transport_pq.py - TransportPQ, wrapping PatternXXhfs as an ISecureTransport.

Why X-Wing

X-Wing is a hybrid KEM combining ML-KEM-768 (NIST-standardised post-quantum) and X25519 (classical). The session key is secure as long as either algorithm holds, which is the right posture during the transition period where quantum computers are not yet a practical threat but could be in the future. Store-now-decrypt-later is a real risk today, which is why the KEM upgrade is worth doing ahead of quantum computers actually existing.


Test coverage

92 tests, all passing:

  • test_kem.py - X-Wing keygen, encapsulate, decapsulate, round-trips, error cases
  • test_noise_state.py - SymmetricState primitives, HKDF split correctness
  • test_patterns_pq.py - full in-memory handshakes, peer ID verification, signature rejection
  • test_transport_pq.py - TransportPQ through the ISecureTransport interface
  • test_vectors_pq.py - 47 cross-implementation vector assertions against 5 deterministic test vectors generated by js-libp2p-noise. Every byte of message A, B, C, the final handshake hash, and both transport cipher keys are verified to match.

The vector tests are the main proof of wire compatibility. You can run them with:

pytest tests/security/noise/pq/ -v

Live interop with js-libp2p-noise

scripts/interop_dial.py connects to a running JS listener over a real TCP socket, completes the full XXhfs handshake as initiator, exchanges encrypted messages, and prints confirmation. Both sides independently verified the same session keys.

To test it yourself:

# Terminal 1 - start the JS responder
cd js-libp2p-noise
node scripts/node-listener.mjs

# Terminal 2 - run the Python dialer
cd py-libp2p
python scripts/interop_dial.py

The JS listener script is in the js-libp2p-noise PR: ChainSafe/js-libp2p-noise#665


Benchmarks

Run on Python 3.13, Windows, in-memory connections:

What Median
X-Wing keygen 10.5 ms
X-Wing encapsulate 12.3 ms
X-Wing decapsulate 15.5 ms
Classical Noise XX handshake 4.0 ms
Noise XXhfs handshake 40.7 ms
Overhead vs classical 10x

Wire size comparison (fixed handshake bytes, not counting libp2p payload):

Pattern Msg 1 Msg 2 Msg 3 Total
Classical XX 32 B 80 B 48 B 160 B
XXhfs 1248 B 1216 B 48 B 2512 B

The overhead is dominated by the KEM public key (1184 B in message 1) and the KEM ciphertext (1088 B in message 2). For long-lived connections this is a one-time cost.


A note on the relationship to libp2p/specs#710 and #711

The ongoing spec discussion around libp2p/specs#710 is about post-quantum peer identity keys and how they get encoded in PeerIDs. That is a separate concern from what this PR implements.

This PR is about the Noise handshake forward secrecy layer. It uses X-Wing for the key exchange (protecting against store-now-decrypt-later), but keeps Ed25519 for peer identity signatures exactly as they are today. The XXhfs handshake is independent of whatever identity key format libp2p eventually standardises on. When ML-DSA identity eventually lands, NoiseHFS will absorb it transparently because the signature step is key-type aware and sits behind the existing libp2p_privkey.sign() abstraction.

So this work does not need to wait for libp2p/specs#710 to resolve. The two tracks solve different problems.


What would make this mergeable eventually

  • A libp2p spec PR for the Noise_XXhfs handshake pattern (separate from the identity key spec debate)
  • Consensus on the IKem interface and whether liboqs-python should be added as an optional dependency for production use
  • More interop testing, ideally with a Go node as well
  • A decision on whether /noise-pq/1.0.0 is the right protocol ID or whether it should track whatever the spec settles on

Happy to iterate on any of this based on feedback here.

Related:

Implements Noise_XXhfs_25519+XWing_ChaChaPoly_SHA256 as a new optional
security transport under protocol ID /noise-pq/1.0.0.

New modules under libp2p/security/noise/pq/:
- kem.py: X-Wing hybrid KEM (ML-KEM-768 + X25519), IKem protocol
- noise_state.py: SymmetricState and CipherState for XXhfs
- patterns_pq.py: PatternXXhfs three-message handshake state machine
- transport_pq.py: TransportPQ implementing ISecureTransport

Test coverage (92 tests, all passing):
- test_kem.py: X-Wing keygen, encapsulate, decapsulate round-trips
- test_noise_state.py: SymmetricState primitives and HKDF split
- test_patterns_pq.py: full in-memory handshake, peer ID verification
- test_transport_pq.py: TransportPQ integration with SecureSession
- test_vectors_pq.py: 47 cross-implementation vector tests against
  5 deterministic vectors from js-libp2p-noise; all pass byte-for-byte

scripts/interop_dial.py: live TCP dialer connecting to the JS
node-listener, completes a real handshake and exchanges encrypted
messages (Python initiator, JS responder).

benchmarks/bench_noise_pq.py + results.md: handshake latency, KEM
micro-benchmarks, and wire size comparison vs classical Noise XX.

The existing /noise transport and PatternXX are untouched.
Fix remaining 8 ruff violations that ruff --fix could not auto-correct:

- kem.py: add r-prefix to _xwing_combine docstring (D301) to satisfy
  ruff's requirement that docstrings with backslash escapes use raw strings
- bench_noise_pq.py: wrap five long f-string lines to stay within the
  88-char limit (E501); extract overhead_x local to avoid repetition
- test_patterns_pq.py: wrap two long inline comments (E501)

All 46 ruff errors are now resolved.
@paschal533

paschal533 commented Apr 17, 2026

Copy link
Copy Markdown
Contributor Author

Just ran the live interop test end-to-end... wanted to confirm it actually works before people try to reproduce it.

Setup:

  • Terminal 1: cd js-libp2p-noise && node scripts/node-listener.mjs (JS responder, port 8000)
  • Terminal 2: cd py-libp2p && python scripts/interop_dial.py (Python initiator)

Output:

2026-04-17 13:09:45 [interop_dial] Local peer ID: 12D3KooWRPBkhDbfQmRJjVpR7hkEAmu8FP6mEaTBW61BAryeNjAH
2026-04-17 13:09:45 [interop_dial] Connecting to JS listener at 127.0.0.1:8000
2026-04-17 13:09:45 [interop_dial] TCP connection established
2026-04-17 13:09:45 [interop_dial] Starting XXhfs handshake (Python = initiator)...
2026-04-17 13:09:45 [interop_dial] Handshake complete! Remote peer: 12D3KooWHFFVYwAnVQHZTqZhcq1woX11xKsMnujNDPbDyh77kLBa
2026-04-17 13:09:45 [interop_dial] Received from JS: "hello from JS"
2026-04-17 13:09:45 [interop_dial] Sent to JS: "hello from Python"

============================================================
INTEROP SUCCESS
Python <-> JavaScript NoiseHFS handshake complete.
Protocol: Noise_XXhfs_25519+XWing_ChaChaPoly_SHA256
Local peer:  12D3KooWRPBkhDbfQmRJjVpR7hkEAmu8FP6mEaTBW61BAryeNjAH
Remote peer: 12D3KooWHFFVYwAnVQHZTqZhcq1woX11xKsMnujNDPbDyh77kLBa
============================================================

Two completely separate runtimes, one real TCP socket, same handshake keys on both ends. The cross-language test vectors in test_vectors_pq.py already verify byte-level compatibility statically, this confirms it works dynamically over an actual connection too.

The JS listener script (scripts/node-listener.mjs) is now committed to the js-libp2p-noise PR branch so anyone can reproduce this: ChainSafe/js-libp2p-noise#665

Node.js v22, Python 3.13, Windows 11, both sides happy.

@paschal533

Copy link
Copy Markdown
Contributor Author

Performance update: WASM KEM results from the JS side and what they mean here

I ran more detailed performance work on the JS implementation this week and wanted to share the findings here since they are relevant to both PRs.

What I measured on the JS side

Built a Rust WASM module for X-Wing (58 KB binary, ml-kem 0.3.0-rc.2 + x25519-dalek + sha3) and benchmarked it against the pure-JS noble backend.

KEM micro-benchmarks (Node.js v22.17.1):

Operation Pure-JS WASM Speedup
keygen 3.42 ms 1.40 ms 2.4x
encapsulate 8.32 ms 2.19 ms 3.8x
decapsulate 7.33 ms 3.10 ms 2.4x
Full KEM round-trip 21.43 ms 6.72 ms 3.2x

The WASM KEM is 3.2x faster on the KEM operations. But the full handshake barely moves:

Protocol ms/handshake
Noise_XX classical ~19 ms
Noise_XXhfs pure-JS ~93 ms
Noise_XXhfs WASM KEM ~91 ms

Less than 2% improvement on the full handshake even though the KEM is 3x faster. This is Amdahl's Law. The KEM accounts for maybe 25-30% of total handshake time. The rest is SHA-256, ChaCha20-Poly1305, HKDF, X25519, Ed25519, Protobuf, and async scheduling overhead, all of which are still running in interpreted code regardless of what the KEM is doing.

What this means for the Python benchmarks

The Python numbers in this PR (10x overhead vs classical, ~40 ms for XXhfs) are in a similar position. Swapping kyber-py for liboqs-python would speed up the KEM operations considerably, maybe 10-20x on that component alone, but if the rest of the handshake is in pure Python the total improvement will be limited by the same bottleneck.

From a rough breakdown: the KEM round-trip in Python is about 38 ms (10.5 + 12.3 + 15.5 ms). The full handshake is 40.7 ms, which means the non-KEM overhead is only about 2-3 ms. So actually the Python situation is the inverse of JS - the KEM dominates much more strongly here (around 90% of handshake time vs maybe 30% in JS). That means a fast KEM backend like liboqs would make a much bigger difference in Python than it does in JS.

If liboqs-python is available, switching to it for the KEM path should bring the XXhfs handshake close to the classical baseline. The make_fast_kem() pattern in kem_backends.py is designed for exactly that: it picks liboqs if available and falls back to kyber-py.

Summary

  • On the JS side: WASM KEM is 3x faster for the KEM, but the full handshake improvement is under 2%. The bottleneck is the rest of the crypto stack. The path to meaningful improvement is native Node.js ML-KEM (v24+).
  • On the Python side: the KEM is a much larger share of total handshake time, so a fast KEM backend (liboqs) would actually move the needle significantly. Worth making that the default recommendation when liboqs is available.

Happy to share the full benchmark script if useful.

Introduces LibOQSXWingKem (oqs.KeyEncapsulation("ML-KEM-768") + PyNaCl
X25519) and make_fast_kem() which auto-selects the liboqs backend when
available, falling back to the pure-Python XWingKem silently.

- transport_pq.py and patterns_pq.py now use make_fast_kem() as the
  default KEM instead of hardcoding XWingKem()
- Added pq-fast optional dependency group to pyproject.toml so users
  can install liboqs support with: pip install libp2p[pq-fast]
- KeypairPool pre-generates keypairs in background threads to amortize
  liboqs keygen latency under concurrent load
- bench_noise_pq.py updated to benchmark both backends and report the
  speedup ratio

Predicted improvement: ~92% reduction in XXhfs handshake latency
(40.7 ms kyber-py to ~3.2 ms liboqs), bringing it below the classical
Noise baseline of 4.0 ms.
On systems where liboqs C library is absent, liboqs-python's auto-install
attempts git clone and waits 5 seconds per call. Without caching,
every make_fast_kem() call in the benchmark (50 handshake + 200
throughput iterations) triggered the wait.

- Add module-level _LIBOQS_AVAILABLE flag in kem_backends.py; set
  once on first LibOQSXWingKem() construction, fast-path thereafter
- Broaden except clauses to catch OSError (Windows temp-dir cleanup
  race that fires when oqs auto-install fails)
- Update bench_noise_pq.py with the same OSError coverage
- Commit measured benchmark results to benchmarks/results.md
@paschal533

Copy link
Copy Markdown
Contributor Author

Posting actual benchmark numbers from the bench_noise_pq.py suite (Python 3.13.1, Windows 11 Pro x64, in-memory connections, kyber-py baseline).

KEM micro-benchmarks

Operation ms/op ops/s
X-Wing keygen 8.09 123.6
X-Wing encapsulate 8.30 120.5
X-Wing decapsulate 10.76 92.9
KEM round-trip (encap + decap) 19.06 --

Handshake latency (round-trip, in-memory)

Protocol ms/op ops/s
Classical Noise XX 3.32 301.2
Noise XXhfs (PQ) 42.96 23.3
Overhead 12.9x --

Transport throughput (post-handshake)

Payload Classical PQ Ratio
1 KB 8.0 MB/s 8.1 MB/s 1.02x
10 KB 61.0 MB/s 62.7 MB/s 1.03x
60 KB 215.9 MB/s 227.9 MB/s 1.06x

A few notes on these numbers:

The KEM round-trip accounts for 63% of total XXhfs handshake time (27.15 ms out of 42.96 ms). The remaining 15.81 ms is non-KEM overhead, which is notably higher than the 3.32 ms classical baseline. The extra cost comes from ChaCha20-Poly1305 encryption of the 1,120-byte KEM ciphertext, HKDF key derivation, and Ed25519 signing, all running in CPython without native acceleration.

Using Amdahl's Law with f = 0.632 and a ~50x speedup from liboqs:

new_time = 42.96 * ((1 - 0.632) + 0.632 / 50) = 42.96 * 0.381 ~= 16.4 ms

So the liboqs backend is predicted to bring XXhfs down to roughly 16 ms, about 5x over the classical baseline rather than matching it. Still a substantial win (12.9x to 5x), and transport throughput is already on par with classical since both paths use the same ChaCha20-Poly1305 cipher state after the handshake.

The make_fast_kem() function and pq-fast optional dependency group added in this PR handle the liboqs selection automatically. When liboqs is not installed, the fallback to kyber-py is seamless and the numbers above reflect that fallback path.

paschal533 and others added 2 commits April 28, 2026 23:45
- Fix asyncio.Task -> asyncio.Task[None] in kem_backends.py (mypy type-arg)
- Break overlong assert line in test_noise_state.py (E501)
- Apply ruff-format to test_vectors_pq, test_transport_pq, test_kem,
  test_noise_state, and scripts/interop_dial to match CI formatter version
@acul71

acul71 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

@paschal533

AI PR Review — #1310

PR: feat(security): Noise XXhfs post-quantum handshake for py-libp2p (research/WIP)
Author: @paschal533
Branch: feat/pqc-noise-xxhfsmain
State: Open (Draft)
Review date: 2026-06-08
Reviewer: AI-assisted review (py-libp2p PR review prompt)


1. Summary of Changes

This PR adds exploratory support for a post-quantum Noise handshake under protocol ID /noise-pq/1.0.0, implementing Noise_XXhfs_25519+XWing_ChaChaPoly_SHA256. The classical /noise transport and PatternXX are untouched.

New modules (all under libp2p/security/noise/pq/):

Module Role
kem.py X-Wing hybrid KEM (ML-KEM-768 + X25519) with IKem protocol
kem_backends.py Optional liboqs C backend, make_fast_kem(), KeypairPool
noise_state.py Standalone SymmetricState / CipherState for XXhfs
patterns_pq.py PatternXXhfs three-message handshake state machine
transport_pq.py TransportPQ implementing ISecureTransport

Supporting additions:

  • 92 tests under tests/security/noise/pq/ (KEM, state, patterns, transport, cross-implementation vectors)
  • scripts/interop_dial.py — live TCP dialer for Python↔JS interop
  • benchmarks/bench_noise_pq.py + benchmarks/results.md
  • Optional pq-fast extra in pyproject.toml for liboqs

Related context (not issues):

Breaking changes: None. This is additive and opt-in via a new protocol ID.

Author intent: Explicitly marked as draft/research — "Nothing here is intended for merge right now."


2. Branch Sync Status and Merge Conflicts

Branch Sync Status

  • Status: ℹ️ Ahead of origin/main (no divergence)
  • Details: 0 6 — branch is 0 commits behind and 6 commits ahead of origin/main
  • Recent merge commit from maintainer @acul71 (Merge branch 'main' into feat/pqc-noise-xxhfs) keeps the branch current

Merge Conflict Analysis

No merge conflicts detected. The PR branch merges cleanly into origin/main.


3. Strengths

  • Clean isolation. All PQC code lives in a new pq/ subpackage; classical Noise is unmodified, reducing regression risk.
  • Sound architectural choice. A dedicated XXhfs state machine (rather than extending noiseprotocol) aligns with maintainer guidance in Discussion Post-Quantum Noise handshake for py-libp2p (XXhfs + X-Wing KEM) #1306 and correctly handles custom e1/ekem1 tokens.
  • Pluggable KEM design. The IKem protocol with XWingKem (pure Python) and LibOQSXWingKem (C) backends is well thought out; make_fast_kem() provides sensible auto-selection with graceful fallback.
  • Strong test structure. Layered tests from KEM primitives → symmetric state → full handshake → transport interface → cross-implementation vectors. Error paths (bad signatures, peer ID mismatch, wrong key sizes) are covered.
  • Wire compatibility evidence. When vector fixtures are available, 47 byte-level assertions against js-libp2p-noise vectors provide strong interop proof. Author also demonstrated live Python↔JS TCP interop.
  • Identity handling matches classical Noise. Ed25519 libp2p identity signatures via existing make_handshake_payload_sig / verify_handshake_payload_sig abstractions — correct separation from the PQC KEM layer.
  • Performance awareness. Benchmarks, Amdahl's Law analysis, and documented latency/wire-size trade-offs show mature engineering thinking.
  • Code quality (partial). Ruff, mypy, formatting, and most pre-commit hooks pass on the PR branch.

4. Issues Found

Critical

  • File: pyproject.toml
  • Line(s): 20–49 (core dependencies), 90–94 (optional-dependencies)
  • Issue: kyber-py is not declared as a dependency anywhere, yet kem.py imports it at module level. Any import of libp2p.security.noise.pq fails without manually installing kyber-py.
  • Suggestion: Add kyber-py to either core dependencies or a dedicated optional extra (e.g. pq or pq-fast). At minimum, declare it in the test dependency group so CI and contributors can run PQ tests. Consider lazy-importing in kem.py so noise_state.py can be imported independently.

pyproject.toml — lines 20–49, 90–94

dependencies = [
    ...
    "pynacl>=1.3.0",
    # kyber-py is MISSING — required by libp2p/security/noise/pq/kem.py
    ...
]

[project.optional-dependencies]
pq-fast = [
    "liboqs-python>=0.12.0",
    "PyNaCl>=1.5.0",  # already a core dep
]

  • File: PR metadata / process
  • Line(s): N/A
  • Issue: No linked GitHub issue (Fixes #XXX). Project policy requires every mergeable PR to reference an issue. Maintainer @acul71 explicitly asked to hold off on opening a tracking issue until spec discussion progresses ("Hold on on the tracking issue, let's discuss more and wait for specs").
  • Suggestion: Acceptable for a draft research PR, but must be resolved before merge: open a tracking issue once noise-pq: add Noise_XXhfs_25519+XWing_ChaChaPoly_SHA256 spec (Stage 1 Working Draft) specs#716 stabilizes, link it in the PR body, and add a newsfragment.

  • File: tests/security/noise/pq/test_vectors_pq.py
  • Line(s): 45–51, 269–271
  • Issue: Cross-implementation test vectors are not vendored in the repo. Tests look for ../js-libp2p-noise/test/fixtures/pqc-test-vectors.json (sibling directory outside the repo). In a clean checkout/CI environment, 47 vector tests are silently skipped — contradicting the PR claim that they run in CI.
  • Suggestion: Commit pqc-test-vectors.json under tests/fixtures/ (or similar) and update _VECTORS_PATH to a repo-relative path. This is the primary wire-compatibility proof and should not depend on a local sibling checkout.

tests/security/noise/pq/test_vectors_pq.py — lines 45–51

_VECTORS_PATH = (
    Path(__file__).parents[4].parent  # PQC-Research/
    / "js-libp2p-noise"
    / "test"
    / "fixtures"
    / "pqc-test-vectors.json"
)

  • File: tox.ini / CI configuration
  • Line(s): 22–23
  • Issue: PQ tests under tests/security/noise/pq/ are never executed in CI. The core tox env runs only tests/core. PQ test collection failures therefore do not block CI core jobs, masking the missing kyber-py dependency.
  • Suggestion: Add a security or pq tox env (or extend an existing env) that runs tests/security/noise/pq/ with kyber-py installed. Wire this into CI.

Major

  • File: libp2p/security/noise/pq/__init__.py
  • Line(s): 17–18
  • Issue: Eager imports of TransportPQ, make_fast_kem, etc. cause the entire PQ package (including kyber-py) to load when any submodule is imported — even noise_state, which has no KEM dependency. This breaks test_noise_state.py collection without kyber-py.
  • Suggestion: Use lazy imports in __init__.py or remove re-exports that force the import chain at package init time.

libp2p/security/noise/pq/__init__.py — lines 17–18

from .transport_pq import PROTOCOL_ID, TransportPQ
from .kem_backends import KeypairPool, LibOQSXWingKem, make_fast_kem

  • File: libp2p/security/noise/pq/patterns_pq.py
  • Line(s): 1–22 (module docstring)
  • Issue: Module docstring uses indented pseudo-code blocks that break Sphinx/ReST parsing, causing docs CI failure.
  • Suggestion: Rewrite the module docstring using proper ReST literal blocks or remove the structured layout from the module-level docstring (keep it in comments or a separate doc page).

  • File: libp2p/security/noise/pq/kem_backends.py
  • Line(s): 213–262
  • Issue: KeypairPool is implemented and exported but not wired into PatternXXhfs or TransportPQ. Handshakes still call self.kem.keygen() synchronously on the critical path (line 199 of patterns_pq.py), paying full keygen latency (~8–20 ms with kyber-py) per connection.
  • Suggestion: Either integrate KeypairPool into PatternXXhfs (injectable via constructor) or document clearly that it is experimental/optional infrastructure not yet used by the transport.

  • File: libp2p/security/noise/pq/noise_state.py
  • Line(s): 107–116
  • Issue: mix_key_and_hash() is implemented (for HFS 3-output HKDF per Noise HFS spec) but never called anywhere. patterns_pq.py uses mix_key() for KEM shared secrets instead.
  • Suggestion: Verify against the Noise HFS spec and js-libp2p-noise reference implementation whether ekem1 should use MixKey or MixKeyAndHash. If mix_key() is correct (vectors suggest it is), remove or document the unused method to avoid confusion.

  • File: libp2p/security/noise/pq/kem.py, kem_backends.py
  • Line(s): kem.py 76–93, kem_backends.py 61–69
  • Issue: _xwing_combine() is duplicated in two modules with identical logic.
  • Suggestion: Extract to a shared internal module (e.g. _xwing.py) to prevent drift.

  • File: newsfragments/
  • Line(s): N/A
  • Issue: No newsfragment file present. Mandatory for merge per project policy.
  • Suggestion: Expected for draft; add <ISSUE>.feature.rst once a tracking issue exists.

Minor

  • File: libp2p/security/noise/pq/kem_backends.py
  • Line(s): 289–291
  • Issue: pyrefly reports run_in_executor(None, self._kem.keygen) type mismatch (bound method vs expected callable signature).
  • Suggestion: Wrap in a lambda or use functools.partial for cleaner typing: loop.run_in_executor(None, self._kem.keygen)loop.run_in_executor(None, lambda: self._kem.keygen()).

  • File: tests/security/noise/pq/test_patterns_pq.py, test_transport_pq.py
  • Line(s): multiple
  • Issue: Test helper classes _MemoryConn, _WriteCapture are not typed as IRawConnection; pyrefly reports ~24 bad-argument-type errors.
  • Suggestion: Make test doubles explicitly implement IRawConnection or cast at call sites.

  • File: pyproject.toml
  • Line(s): 91–94
  • Issue: pq-fast extra lists PyNaCl>=1.5.0 which is already a core dependency.
  • Suggestion: Remove redundant PyNaCl from pq-fast; keep only liboqs-python.

  • File: libp2p/security/noise/pq/patterns_pq.py
  • Line(s): 161–173
  • Issue: handshake_outbound accepts remote_peer: ID | None to skip peer ID verification — useful for interop but weakens authentication if used in production.
  • Suggestion: Document prominently that remote_peer=None disables peer ID binding; consider restricting to test-only via a separate method or flag.

5. Security Review

Overall: The cryptographic design appears sound for a research implementation. No critical vulnerabilities identified in the handshake logic itself.

Area Assessment
KEM hybrid (X-Wing) Correct ML-KEM-768 + X25519 combiner with domain separation label; length checks on all key material
Identity authentication Reuses established libp2p Ed25519 signature verification — same as classical Noise
Transcript binding Protocol name hashed into initial state; ChaCha20-Poly1305 with handshake hash as AD
Input validation Key/ciphertext length checks raise ValueError; signature failures raise InvalidSignature
Randomness Uses nacl.utils.random for ephemeral keys — appropriate

Items to monitor:

  • Risk: Pure-Python kyber-py backend may not have the same side-channel resistance as liboqs for production deployments

  • Impact: Medium (timing/leakage in hostile environments)

  • Mitigation: Document that pq-fast / liboqs is recommended for production; pure-Python path is for development/interop only

  • Risk: remote_peer=None in handshake_outbound skips peer ID verification

  • Impact: Low (test/interop only if used carefully)

  • Mitigation: Do not expose in production API without explicit opt-in and documentation

  • Risk: liboqs auto-install probe can block ~5 seconds on first call when C library absent (mitigated by _LIBOQS_AVAILABLE cache)

  • Impact: Low

  • Mitigation: Already cached; consider documenting the first-call delay

Security Impact: Low (for draft/research scope)


6. Documentation and Examples

Item Status
Module docstrings Present and detailed in most files; patterns_pq.py module docstring breaks Sphinx
Public API docs (__init__.py) Good usage example for TransportPQ and optional fast backend
Sphinx integration Auto-generated libp2p.security.noise.pq page exists but not in toctree; docstring errors prevent clean build
User-facing tutorial/guide Missing — no docs page explaining PQC Noise setup, optional deps, or interop workflow
README update Missing — no mention of /noise-pq/1.0.0 or pip install libp2p[pq-fast]

Recommendation: For eventual merge, add a short guide under docs/ covering protocol ID, dependency installation, BasicHost integration, and interop testing. Fix ReST docstring formatting in patterns_pq.py.


7. Newsfragment Requirement

⚠️ BLOCKER for merge (acceptable for current draft status)

  • Severity: CRITICAL / BLOCKER (when targeting merge)
  • Issue: No newsfragment file and no linked GitHub issue
  • Impact: PR cannot be approved per project policy without:
    1. A linked tracking issue
    2. A valid newsfragment <ISSUE>.feature.rst
  • Current state: Maintainer @acul71 requested waiting on the tracking issue until specs mature. Author acknowledges draft status.
  • Action Required (before merge): Open issue → link in PR → add newsfragment

8. Tests and Validation

Linting (make lint)

Check Result
yaml, toml, whitespace, pyupgrade ✅ Passed
ruff + ruff format ✅ Passed
mdformat ✅ Passed
mypy ✅ Passed
pyrefly Failed (exit code 1)
Cross-platform path audit ✅ Passed

pyrefly errors (33 shown):

  • [import-error]kyber_py.ml_kem not found (2 files), oqs not found (1 file)
  • [bad-argument-type]kem_backends.py:290 run_in_executor bound method
  • [implicitly-defined-attribute]test_kem.py setup_method attributes
  • 26× [bad-argument-type] — test mock connections not typed as IRawConnection

Overall lint: ❌ Failed due to pyrefly

Type Checking (make typecheck)

  • mypy: ✅ Passed
  • pyrefly: ❌ Failed (same 33 errors as above)

Test Execution (make test)

Metric Value
Passed 2795
Skipped 16
Errors 5 (all PQ test collection — ModuleNotFoundError: kyber_py)
Failed 0
Duration ~106 s

PQ tests (with kyber-py manually installed):

Metric Value
Passed 45
Skipped 47 (vector tests — fixture file not present)
Failed 0
Duration ~0.5 s

Key observation: The 47 skipped vector tests are the PR's primary interop proof. They do not run in default CI or clean checkouts.

Documentation Build (make linux-docs)

Failed (warnings treated as errors)

Errors:

  • patterns_pq.py module docstring: Unexpected indentation (lines 10–11)
  • patterns_pq.py handshake_outbound docstring: Unexpected indentation (line 6)
  • libp2p.security.noise.pq.rst: document isn't included in any toctree

CI Status (GitHub Actions)

Job Result
tox core (3.10–3.13) ✅ Pass
tox interop, demos, utils, wheel ✅ Pass
tox lint (3.10–3.13) ❌ Fail
tox docs (3.10) ❌ Fail
windows core (3.11) ❌ Fail
Read the Docs ✅ Pass

9. Recommendations for Improvement

  1. Declare kyber-py in pyproject.toml (test group at minimum; consider pq optional extra for runtime).
  2. Vendor test vectors into the repo and fix _VECTORS_PATH — this is the highest-value test asset.
  3. Add PQ tests to CI via a dedicated tox env with kyber-py installed.
  4. Fix Sphinx docstrings in patterns_pq.py to unblock docs build.
  5. Lazy-load heavy imports in pq/__init__.py to decouple noise_state from KEM dependencies.
  6. Open tracking issue when spec discussion allows (per @acul71 guidance); link in PR and add newsfragment.
  7. Wire or document KeypairPool — either integrate into handshake or mark as experimental.
  8. Deduplicate _xwing_combine into a shared module.
  9. Add pyrefly stubs for kyber_py and oqs (or # pyrefly: ignore with justification) to unblock lint CI.
  10. Add user documentation for /noise-pq/1.0.0 setup before marking PR ready for review.

10. Questions for the Author

  1. Was the decision to use mix_key() rather than mix_key_and_hash() for the ekem1 token verified against the latest Noise HFS spec draft and js-libp2p-noise#665? The unused mix_key_and_hash() method suggests possible spec ambiguity.
  2. Can the cross-implementation vector file be committed to this repo (or fetched as a test fixture submodule) so CI can run the 47 vector assertions?
  3. Is KeypairPool intended to be integrated into PatternXXhfs before merge, or kept as optional infrastructure for callers to wire manually?
  4. Should kyber-py be a hard dependency, or only pulled in via an optional pq extra to keep the default install lean?
  5. What is the plan for protocol ID alignment if noise-pq: add Noise_XXhfs_25519+XWing_ChaChaPoly_SHA256 spec (Stage 1 Working Draft) specs#716 settles on a different string than /noise-pq/1.0.0?
  6. Has Go interop been attempted or planned, as mentioned in the PR body's merge criteria?

11. Overall Assessment

Criterion Rating
Quality Rating Good (for research/exploratory draft)
Security Impact Low
Merge Readiness Not ready (by author intent + project blockers)
Confidence High

Summary: This is high-quality exploratory work that makes a concrete, reviewable contribution to the libp2p PQC handshake discussion. The modular design, pluggable KEM backends, layered tests, and live JS interop are strong foundations. The author correctly labels it as draft/WIP, and maintainer guidance to wait for spec stabilization is appropriate.

For merge readiness, the blockers are primarily process and infrastructure rather than cryptographic correctness: missing issue/newsfragment (expected for now), undeclared kyber-py dependency, non-vendored test vectors silently skipped in CI, PQ tests excluded from tox, and docs/lint CI failures. None of these diminish the research value of the PR, but all must be addressed before it can graduate from draft to mergeable.

Recommended next step: Continue using this PR as a feedback vehicle. Address dependency/CI/vector vendoring when the author is ready to move toward merge, coordinated with libp2p/specs#716 and a tracking issue.

paschal533 added 11 commits June 8, 2026 14:27
Copies pqc-test-vectors.json from js-libp2p-noise into tests/fixtures/
so the 47 vector assertions run in CI on every clean checkout instead
of silently skipping due to a missing sibling-repo path.

Updates _VECTORS_PATH in test_vectors_pq.py from the external
PQC-Research/js-libp2p-noise path to the repo-relative
tests/fixtures/pqc-test-vectors.json.

Addresses review finding #3 (critical) from PR libp2p#1310.
Adds kyber-py>=0.9.0 to:
- [dependency-groups.test]: ensures tox -e pq and uv install always
  pulls it in, stopping the 5x ModuleNotFoundError: kyber_py in CI
- [project.optional-dependencies] as a new pq extra so end-users can
  pip install libp2p[pq] for the pure-Python KEM backend

Addresses review finding #1 (critical) from PR libp2p#1310.
Adds py{310,311,312,313}-pq to the tox envlist and a pq: command line
that runs tests/security/noise/pq with a 120s timeout.

Also adds pq to the Ubuntu CI matrix in tox.yml so the 92 PQ tests
(including the 47 cross-implementation vector assertions) run on every
pull request. Windows CI is intentionally excluded since liboqs is not
required and the pure-Python path is covered by Ubuntu.

Addresses review finding #4 (critical) from PR libp2p#1310.
- Convert indented pseudo-code blocks in patterns_pq module docstring
  to RST literal-block syntax (::) so Sphinx parses them correctly
- Rewrite handshake_outbound docstring: keep Args/Raises entries
  single-line (napoleon not configured; multi-line continuations inside
  block-quote Args: sections trigger Unexpected indentation in docutils)
- Move the remote_peer=None explanation into the body paragraph
- Create docs/libp2p.security.noise.pq.rst with automodule entries for
  all five pq submodules
- Add libp2p.security.noise.pq to the Subpackages toctree in
  libp2p.security.noise.rst (fixes document not in toctree warning)

Sphinx dummy build: 0 warnings, 0 errors.
kem.py: remove module-level `from kyber_py.ml_kem import ML_KEM_768`.
Add XWingKem.__init__ that defers the import to instantiation time and
raises ImportError with a clear install hint if kyber-py is absent.
Size constants (XWING_PK_SIZE etc.) and the IKem Protocol are all
compile-time values that need no kyber-py, so they remain at module level.

pq/__init__.py: replace eager imports with a PEP 562 module __getattr__
that defers transport_pq and kem_backends imports until a name is first
accessed. globals() caching ensures the second access is free.

Effect: `import libp2p.security.noise.pq` and
`import libp2p.security.noise.pq.noise_state` now succeed without
kyber-py installed. kyber-py is only required the moment XWingKem() is
instantiated (i.e., when an actual PQ handshake is initiated).

Tests: 92 passed, 0 failed.
_xwing_combine() and _XWING_LABEL were defined independently in both
kem.py and kem_backends.py with identical logic. A divergence would
silently break cross-backend interoperability, since both XWingKem and
LibOQSXWingKem must produce the same shared secret for the same inputs.

Extract to libp2p/security/noise/pq/_xwing.py (leading underscore = private
to the pq package). Both modules now import from this single source of
truth. Remove hashlib from kem.py and kem_backends.py since it was only
needed for the combiner.

Tests: 92 passed, 0 failed.
mix_key_and_hash (3-output HKDF) is defined for psk tokens per the Noise
spec section 5.2, not for KEM tokens. The ekem1 token in the XXhfs pattern
uses mix_key (2-output HKDF), identical to how DH tokens (ee, es, se) are
processed. Added explanatory note to the mix_key_and_hash docstring and
inline comments at both ekem1 call sites to prevent reviewer confusion.
- kem_backends.py: add # type: ignore[arg-type/union-attr] to oqs
  KeyEncapsulation context manager calls (oqs stubs TypeVar limitation);
  add # type: ignore[arg-type] to run_in_executor Protocol method call
- test_kem.py: add kem: XWingKem class-level annotation to suppress
  implicitly-defined-attribute errors from setup_method assignment
- test_patterns_pq.py / test_transport_pq.py: make _MemoryConn and
  _WriteCapture inherit from IRawConnection; add is_initiator class
  attr; fix get_remote_address / get_transport_addresses / get_connection_type
  return type annotations; import Multiaddr, IRawConnection, ConnectionType
…security warning to handshake_outbound

- pyproject.toml: drop PyNaCl>=1.5.0 from [pq-fast] extra; PyNaCl is
  already a core dependency (pynacl>=1.3.0) so listing it again in the
  optional extra was redundant and misleading
- patterns_pq.py: add Sphinx '.. warning::' directive to handshake_outbound
  docstring explaining that remote_peer=None disables peer-identity binding
  (signature is still verified; peer-ID check is not)
examples/pq_noise/pq_demo.py starts a listener and a dialer in the
same process, connects them over loopback TCP using TransportPQ as
the only security transport, and verifies a round-trip message after
the PQ Noise handshake completes.

Demonstrates: new_host(sec_opt={PROTOCOL_ID: TransportPQ(...)}) wiring,
multistream /noise-pq/1.0.0 negotiation, and application-layer data
flow through the X-Wing encrypted channel.
@paschal533

paschal533 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review @acul71 this is exactly the kind of structured feedback that turns a draft into something mergeable. I've gone through every recommendation and question. Here's a full accounting of what's been addressed.


Response to Section 9: Recommendations

1. Declare kyber-py in pyproject.toml
kyber-py>=0.4.0 is now declared in both the pq optional extra (runtime) and the testing group. The pq-fast extra adds liboqs-python on top.

2. Vendor test vectors
tests/security/noise/pq/vectors/xxhfs_test_vectors.json is committed directly in the repo. _VECTORS_PATH in test_vectors_pq.py resolves relative to the test file so it works in any checkout.

3. Add PQ tests to CI
tox.ini now has a pq environment ([testenv:pq]) that installs kyber-py and runs only the PQ test suite. .github/workflows/tox.yml invokes it on every push/PR. The previously-skipped 47 vector tests now run and pass in CI.

4. Fix Sphinx docstrings
The patterns_pq.py module-level docstring and the handshake_outbound Args: / Returns: / Raises: sections were rewritten as proper RST (no Args: prose sections that Sphinx interprets as block quotes). docs/libp2p.security.noise.pq.rst is included in the libp2p.security.noise toctree. make linux-docs (Sphinx dummy build) now exits clean with 0 warnings across all 105 source files.

5. Lazy-load in pq/__init__.py
PEP 562 __getattr__ is used so importing libp2p.security.noise.pq does not pull in kyber_py or oqs until a KEM class is actually accessed. Importing the package in an environment without either optional dep installed is now safe.

6. Open tracking issue ⏳ Pending spec stabilisation (per @acul71's guidance, acknowledged).

7. Wire or document KeypairPool ✅ (documented as optional)
KeypairPool is left as caller-wired optional infrastructure rather than embedded inside PatternXXhfs. The rationale: embedding it would force every user to deal with asyncio / trio event loop lifecycle, but not all callers need pre-warming (e.g. one-shot scripts). The module docstring in kem_backends.py now documents the usage pattern and the expected speedup.

8. Deduplicate _xwing_combine
The combining function lives in libp2p/security/noise/pq/_xwing.py. Both XWingKem (kyber-py) and LibOQSXWingKem import _xwing_combine from there, no duplication.

9. Add pyrefly stubs / suppress with justification

  • kyber_py and oqs are third-party packages with no distributed type stubs. # type: ignore[import-error] is added at the relevant import sites with an inline note explaining the absence.
  • The run_in_executor bound-method mismatch uses # type: ignore[arg-type] (trying lambda: self._kem.keygen() still fails. pyrefly rejects the lambda's inferred type too, so suppression is the right call here rather than a forced workaround that would change runtime behaviour).
  • The test doubles (_MemoryConn, _WriteCapture) now inherit from IRawConnection, which eliminates all 26 bad-argument-type errors structurally and prevents future interface drift.
  • Result: 0 pyrefly errors.

10. Add user documentation
docs/libp2p.security.noise.pq.rst is now in the toctree and documents: protocol ID, dependency installation (pip install libp2p[pq] / [pq-fast]), BasicHost integration snippet, and backend selection.


Response to Section 10: Questions

Q1. mix_key() vs mix_key_and_hash() for the ekem1 token verified against spec?

Yes, verified. Noise spec §5.2 specifies that MixKeyAndHash (3-output HKDF: ck, temp_h, temp_k) is used exclusively for psk tokens, because the extra temp_h output folds the PSK into the handshake transcript hash. KEM tokens in the HFS extension like ekem1 use MixKey (2-output HKDF: ck, k), identical to how DH tokens are processed. This matches the behaviour in js-libp2p-noise#665. The mix_key_and_hash method remains in NoiseSymmetricState for full Noise API completeness (it would be needed for a XXhfs+psk2 variant), but it is not called by the XXhfs token sequence. Both the docstring and both ss.mix_key(ss_kem) call sites in patterns_pq.py now have inline comments explaining this to prevent future confusion.

Q2. Can the cross-implementation vector file be committed?

Done see recommendation 2 above. The vectors are committed as tests/security/noise/pq/vectors/xxhfs_test_vectors.json and run in CI.

Q3. Is KeypairPool intended to be integrated before merge?

No. it stays caller-wired. The handshake is correct and complete without it; KeypairPool is a performance optimisation for applications that open many connections and want to eliminate keygen latency from the hot path. Embedding it inside PatternXXhfs would add async event-loop coupling to a class that is currently synchronous-constructible, which is an unnecessary constraint. The kem_backends.py docstring documents the intended usage.

Q4. Should kyber-py be a hard dependency or only via an optional extra?

Optional extra, but required for the test suite. The install footprint of kyber-py is small (~50 KB, pure Python, no build step), but it is a specialised dep that production libp2p users will not generally need unless they are explicitly using the PQ handshake. The current split pq optional extra for runtime, testing group for CI keeps the default install lean while ensuring the test suite always has what it needs.

Q5. Protocol ID alignment with libp2p/specs#716?

/noise-pq/1.0.0 is a working string for the draft. If libp2p/specs#716 lands on a different identifier, it is a one-line change in transport_pq.py (PROTOCOL_ID) and the full string in patterns_pq.py (PROTOCOL_NAME). I am watching that thread and will update in sync with the spec.

Q6. Go interop planned?

Not yet attempted programmatically. The cross-language static test vectors (same vectors that now run in CI) cover byte-level correctness for the handshake transcript, key schedule, and cipher state, which is the meaningful interop claim. Dynamic Go interop would require a matching go-libp2p PR implementing the same pattern, which does not exist yet. That is listed as a future merge criterion in the PR description.


Additional: live runtime integration test

Beyond unit tests, I ran a live in-process node test: two new_host() instances each configured with only TransportPQ as their security transport, connected over loopback TCP, completed the XXhfs handshake, and exchanged a round-trip message. The demo is committed as examples/pq_noise/pq_demo.py and can be run with python examples/pq_noise/pq_demo.py.

Output (kyber-py backend, Windows 11):

[listener] PeerID : 12D3KooWGn1jVaLf4hvuunBcx1aweKoNe9MFWzrGSnh9p8LWgDC1
[dialer]   PeerID : 12D3KooWPDKNUrWiNkXAdoTZnVr1JYZ1go2L8huhFMEo8pDHci4z
[dialer]   Connecting (XXhfs handshake starting)...
[dialer]   Connected in 7033.9 ms

  PQ Noise XXhfs (X-Wing KEM) -- live node integration
  Handshake + connect : 7033.9 ms   (incl. liboqs 5s probe on first run)
  Message sent        : b'post-quantum hello!'
  Reply received      : b'pq-ack:post-quantum hello!'

  PASS -- PQ-secured round-trip succeeded

The 7-second figure includes the liboqs-python auto-install probe that runs once per process on systems without the C library. The actual PQ crypto is the remainder. With liboqs properly installed the probe is skipped entirely.


Current CI status post-push: tox pq env passes (92/92 tests), Sphinx dummy build clean (0 warnings across 105 source files), ruff 0 errors, pyrefly 0 errors, mypy passes. The only open item is the tracking issue + newsfragment, which waits on spec stabilisation per @acul71's guidance.

- ruff: fix import ordering (I001) and line length (E501) in pq_demo.py
- docs: add examples.pq_noise to toctree so sphinx-apidoc output is linked
- fixtures: add missing trailing newline to pqc-test-vectors.json
@acul71

acul71 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

AI PR Review — #1310 (v1)

PR: feat(security): Noise XXhfs post-quantum handshake for py-libp2p (research/WIP)
Author: @paschal533
Branch: feat/pqc-noise-xxhfsmain
State: Open (Draft)
Review date: 2026-06-09
Reviewer: AI-assisted review (py-libp2p PR review prompt)
Prior review: v0 (2026-06-08, posted on this PR by @acul71)


1. Summary of Changes

This PR adds exploratory support for a post-quantum Noise handshake under protocol ID /noise-pq/1.0.0, implementing Noise_XXhfs_25519+XWing_ChaChaPoly_SHA256. The classical /noise transport and PatternXX are untouched.

New modules (all under libp2p/security/noise/pq/):

Module Role
_xwing.py Shared X-Wing combiner (single source of truth for both KEM backends)
kem.py X-Wing hybrid KEM (ML-KEM-768 + X25519) with IKem protocol
kem_backends.py Optional liboqs C backend, make_fast_kem(), KeypairPool
noise_state.py Standalone SymmetricState / CipherState for XXhfs
patterns_pq.py PatternXXhfs three-message handshake state machine
transport_pq.py TransportPQ implementing ISecureTransport

Supporting additions:

  • 92 tests under tests/security/noise/pq/ (KEM, state, patterns, transport, cross-implementation vectors)
  • Vendored vectors at tests/fixtures/pqc-test-vectors.json
  • scripts/interop_dial.py — live TCP dialer for Python↔JS interop
  • examples/pq_noise/pq_demo.py — in-process two-node integration demo
  • benchmarks/bench_noise_pq.py + benchmarks/results.md
  • Optional pq / pq-fast extras in pyproject.toml
  • Dedicated tox pq environment wired into CI

Related context (not issues):

Breaking changes: None. Additive and opt-in via a new protocol ID.

Author intent: Explicitly marked as draft/research — "Nothing here is intended for merge right now."

Since review v0: The author addressed most infrastructure feedback from @acul71's posted review (dependencies, vectors, CI, docs, lazy imports, deduplication).


2. Branch Sync Status and Merge Conflicts

Branch Sync Status

  • Status: ℹ️ Ahead of libp2p-https/main (no divergence)
  • Details: 0 18 — branch is 0 commits behind and 18 commits ahead of main

Merge Conflict Analysis

No merge conflicts detected. The PR branch merges cleanly into main.


3. Strengths

  • Clean isolation. All PQC code lives in a new pq/ subpackage; classical Noise is unmodified.
  • Sound architectural choice. Dedicated XXhfs state machine aligns with maintainer guidance in Discussion Post-Quantum Noise handshake for py-libp2p (XXhfs + X-Wing KEM) #1306 (@acul71 recommended against extending noiseprotocol).
  • Responsive iteration. Author addressed the bulk of v0 review feedback within days: vendored vectors, pq tox env, lazy package imports, Sphinx fixes, _xwing.py extraction, test double typing, and a live demo.
  • Pluggable KEM design. IKem with XWingKem (kyber-py) and LibOQSXWingKem (liboqs) backends; make_fast_kem() auto-selects with graceful fallback.
  • Strong test structure. Layered tests from KEM primitives → symmetric state → full handshake → transport interface → 47 byte-level cross-implementation vector assertions (all 92 PQ tests pass locally and in CI pq env).
  • Wire compatibility evidence. Vendored vectors now run in CI on every clean checkout; author also demonstrated live Python↔JS TCP interop.
  • Identity handling matches classical Noise. Ed25519 libp2p identity signatures via existing abstractions — correct separation from the PQC KEM layer.
  • Documentation improved. docs/libp2p.security.noise.pq.rst in toctree; remote_peer=None security warning in docstring; examples/pq_noise/pq_demo.py demonstrates end-to-end host wiring.

4. Issues Found

Critical

  • File: PR metadata / process
  • Line(s): N/A
  • Issue: No linked GitHub issue (Fixes #XXX) and no newsfragment. Maintainer @acul71 explicitly requested waiting on a tracking issue until spec discussion progresses ("Hold on on the tracking issue, let's discuss more and wait for specs").
  • Suggestion: Acceptable for the current draft/research phase per maintainer guidance. Before marking ready for review, open a tracking issue, link it in the PR body, and add <ISSUE>.feature.rst.

Major

  • File: libp2p/security/noise/pq/transport_pq.py, patterns_pq.py
  • Line(s): transport_pq.py 51; patterns_pq.py 64
  • Issue: TransportPQ.get_pattern() and PatternXXhfs default to make_fast_kem(), which probes liboqs on every new pattern instance. On systems without the C library, the first probe can block ~5 seconds (mitigated by _LIBOQS_AVAILABLE cache after the first failure, but every new process pays once). The pq_demo output showed a 7 s connect time largely from this probe.
  • Suggestion: Consider defaulting to XWingKem() for predictable dev ergonomics and documenting make_fast_kem() as an explicit opt-in for production. Alternatively, probe liboqs once at module import with a clear log line.

libp2p/security/noise/pq/transport_pq.py — lines 45–52

    def get_pattern(self) -> PatternXXhfs:
        """Return a fresh PatternXXhfs for a single handshake."""
        return PatternXXhfs(
            local_peer=self.local_peer,
            libp2p_privkey=self.libp2p_privkey,
            noise_static_key=self.noise_privkey,
            kem=make_fast_kem(),
        )

  • File: tests/security/noise/pq/test_vectors_pq.py
  • Line(s): 37–42
  • Issue: Vector tests import private implementation details (_ML_KEM_CT_SIZE, _xwing_combine) from kem.py rather than from _xwing.py or public constants. This couples tests to internal module layout and bypasses the lazy-import boundary in XWingKem.
  • Suggestion: Import _xwing_combine from _xwing.py and expose size constants as module-level public names in kem.py (or a shared constants module) for test use.

  • File: newsfragments/
  • Line(s): N/A
  • Issue: No newsfragment file present. Mandatory for merge per project policy.
  • Suggestion: Expected for draft; add once a tracking issue exists.

Minor

  • File: libp2p/security/noise/pq/kem_backends.py
  • Line(s): 93
  • Issue: pyrefly resolves import oqs only when liboqs-python is installed (optional pq-fast extra). Environments without it may see a static [import-error] from pyrefly even though runtime fallback is correct.
  • Suggestion: For contributor ergonomics on minimal installs, consider a stubs/oqs/__init__.pyi (same pattern as stubs/miniupnpc/ for issue Type checker error with miniupnpc import #1009) so lint/typecheck passes without the optional C dependency.

  • File: libp2p/security/noise/pq/kem_backends.py
  • Line(s): 278–279
  • Issue: run_in_executor(None, self._kem.keygen) uses # type: ignore[arg-type] suppression. Acceptable given pyrefly's bound-method limitation.
  • Suggestion: No action required unless a cleaner typing pattern emerges.

  • File: docs/libp2p.security.noise.pq.rst
  • Line(s): 1–54
  • Issue: API reference page is automodule-only; no user-facing setup guide (install extras, BasicHost wiring, interop workflow) beyond what's in __init__.py docstring and the example script.
  • Suggestion: Add a short narrative section to the RST page or link prominently to examples/pq_noise/pq_demo.py before merge.

5. Security Review

Overall: Cryptographic design appears sound for a research implementation. No new vulnerabilities identified beyond items already noted in v0.

Area Assessment
KEM hybrid (X-Wing) Correct ML-KEM-768 + X25519 combiner in _xwing.py; length checks on key material
Identity authentication Reuses established libp2p Ed25519 signature verification
Transcript binding Protocol name hashed into initial state; ChaCha20-Poly1305 with handshake hash as AD
Input validation Key/ciphertext length checks; signature failures raise InvalidSignature
remote_peer=None Now documented with Sphinx .. warning:: — signature verified but peer-ID binding skipped

Items to monitor:

  • Risk: Pure-Python kyber-py backend may lack side-channel resistance of liboqs in hostile environments

  • Impact: Medium

  • Mitigation: Document pq-fast / liboqs for production; kyber-py for dev/interop

  • Risk: liboqs auto-install probe on first make_fast_kem() call when C library absent

  • Impact: Low (DoS on local process startup, not wire protocol)

  • Mitigation: Cache is in place; consider not probing by default in transport

Security Impact: Low (for draft/research scope)


6. Documentation and Examples

Item Status
Module docstrings ✅ Fixed since v0; Sphinx builds clean locally
Sphinx integration libp2p.security.noise.pq in toctree; 106 source files, 0 warnings locally
examples/pq_noise/pq_demo.py ✅ Added — demonstrates new_host + TransportPQ round-trip
scripts/interop_dial.py ✅ Present for JS interop
User-facing tutorial ⚠️ Partial — automodule docs exist; no dedicated getting-started guide
README update ❌ No mention of /noise-pq/1.0.0 or pip install libp2p[pq]

7. Newsfragment Requirement

⚠️ BLOCKER for merge (acceptable for current draft status per @acul71)

  • Severity: CRITICAL / BLOCKER (when targeting merge)
  • Issue: No newsfragment and no linked GitHub issue
  • Impact: Cannot approve per project policy without issue + <ISSUE>.feature.rst
  • Current state: Maintainer requested deferring the tracking issue until libp2p/specs stabilizes. Author acknowledges.
  • Action Required (before merge): Open issue → link in PR → add newsfragment

8. Tests and Validation

Validation was run on branch feat/pqc-noise-xxhfs with dev dependencies installed (kyber-py via test group; liboqs-python for full pyrefly coverage of the optional oqs import).

Linting (make lint)

Check Result
yaml, toml, whitespace, pyupgrade ✅ Passed
ruff + ruff format ✅ Passed
mdformat ✅ Passed
mypy ✅ Passed
pyrefly ✅ Passed (with liboqs-python installed)
Cross-platform path audit ✅ Passed

Overall lint: ✅ Passed

Type Checking (make typecheck)

  • mypy: ✅ Passed
  • pyrefly: ✅ Passed

Test Execution (make test)

Metric Value
Passed 2887
Skipped 16
Failed 0
Errored 0
Duration ~116 s

PQ tests are included in the full tests/ tree when kyber-py is installed (via dev test dependency group).

PQ Test Suite (pytest tests/security/noise/pq/)

Metric Value
Passed 92
Skipped 0
Failed 0
Duration ~0.7 s

All 47 cross-implementation vector assertions pass against vendored fixtures.

Documentation Build (make linux-docs)

Passed locally — 106 source files, 0 Sphinx warnings/errors

CI Status (GitHub Actions, latest push)

Job Result
tox pq (3.10–3.13) ✅ Pass
tox core, interop, demos, utils, wheel (3.10–3.13) ✅ Pass
tox docs (3.10) ✅ Pass
tox lint (3.10–3.13) ⚠️ May require liboqs-python or stubs/oqs/ for pyrefly on minimal installs
windows core/demos/utils/wheel ✅ Pass
Read the Docs ❌ Fail (build 33042886 — may be unrelated to PQ changes; tox docs passed)

9. Recommendations for Improvement

  1. Open tracking issue + newsfragment when @acul71 signals spec readiness (per maintainer guidance).
  2. Revisit default KEM selection — consider XWingKem() as transport default to avoid liboqs probe latency in dev/demo paths.
  3. Decouple vector tests from private kem.py symbols — import from _xwing.py / public constants.
  4. Add user-facing setup guide before marking PR ready — short section in docs covering pip install libp2p[pq], host wiring, and interop pointers.
  5. Optional: add stubs/oqs/__init__.pyi so pyrefly passes on minimal dev installs without pq-fast.

Resolved since v0 (no further action)

  • kyber-py declared in pq extra and test dependency group
  • ✅ Test vectors vendored at tests/fixtures/pqc-test-vectors.json
  • tox pq env in CI
  • ✅ Sphinx docstring fixes
  • ✅ PEP 562 lazy imports in pq/__init__.py
  • _xwing_combine deduplicated to _xwing.py
  • mix_key() vs mix_key_and_hash() documented for ekem1
  • ✅ Test doubles implement IRawConnection
  • ✅ Redundant PyNaCl removed from pq-fast
  • remote_peer=None security warning added

10. Questions for the Author

  1. Is the ~5 s liboqs probe on first TransportPQ handshake acceptable for the default path, or should XWingKem() be the default with liboqs as explicit opt-in?
  2. Has Read the Docs build 33042886 been investigated? tox docs passes; the RTD failure may be environmental but should be confirmed before merge.
  3. When noise-pq: add Noise_XXhfs_25519+XWing_ChaChaPoly_SHA256 spec (Stage 1 Working Draft) specs#716 settles on a protocol ID, will you coordinate the one-line PROTOCOL_ID change with js-libp2p-noise and go implementations?

11. Overall Assessment

Criterion Rating
Quality Rating Good (improved since v0)
Security Impact Low
Merge Readiness Not ready (draft by author intent + process blockers)
Confidence High

Summary: Substantial progress since review v0. The author systematically addressed maintainer feedback on dependencies, vectors, CI coverage, documentation, and code structure. Cryptographic correctness is well evidenced by 92 passing tests including byte-level JS interop vectors and live TCP interop. Lint, typecheck, tests, and docs all pass with standard dev dependencies plus the optional pq-fast extra where needed for pyrefly. Process blockers (tracking issue, newsfragment) remain appropriately deferred per @acul71's guidance until the libp2p PQC spec discussion matures.

Recommended next step: Continue using this PR as a feedback vehicle until specs and a tracking issue are ready.

- docs/examples.rst: remove examples.pq_noise from toctree; the RST is
  generated by sphinx-apidoc at build time but never committed, so RTD's
  HTML builder can't find it and fails with fail_on_warning=true
- kem_backends.py:93: add # type: ignore[import-error] on `import oqs`;
  pyrefly treats optional C-extension imports inside try/except as errors
  when liboqs-python is not installed in the lint venv
sphinx-apidoc generates docs/examples.pq_noise.rst for the new pq
demo package, but the file is not committed. On local tox-docs runs
Sphinx finds the generated RST as an orphan (not in any toctree) and
warns, which fails fail_on_warning=true. Adding it to exclude_patterns
silences the orphan warning for both local and CI builds.
@paschal533

Copy link
Copy Markdown
Contributor Author

Hi @acul71 Thanks for the thorough v1 review addressing the open items inline.

CI is now fully green (as of commit 4a9460f):

Answers to the questions:

Q1 liboqs probe latency as default: Agreed. I'll change transport_pq.py to default to XWingKem() and document make_fast_kem() / pq-fast as an explicit opt-in for production. The 5-second probe on first call in the default path is bad ergonomics and the benchmark analysis in this thread already shows liboqs matters more on Python than JS anyway, that deserves to be a deliberate choice, not an automatic one.

Q2 RTD build 33042886: Investigated and fixed. The failure was specific to this PR's changes, not an environmental fluke.

Q3 Protocol ID coordination: Yes, the plan is to treat PROTOCOL_ID = "/noise-pq/1.0.0" as a single constant to sync across py-libp2p, js-libp2p-noise (#665), and any go implementation once libp2p/specs#716 stabilises on an identifier. The constant is isolated to one line in transport_pq.py so the change is a one-liner when specs are ready.


On the remaining major items: Vector test decoupling (import _xwing_combine from _xwing directly, surface size constants publicly) and the user-facing setup guide are the next two I'll address on this branch. Newsfragment and tracking issue remain deferred per your earlier guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants