Skip to content

fix: async Web3 calls + nonce manager to prevent event loop blocking#2

Closed
HaraldeRoessler wants to merge 1 commit into
MoltyCel:mainfrom
HaraldeRoessler:fix/async-web3-nonce-manager
Closed

fix: async Web3 calls + nonce manager to prevent event loop blocking#2
HaraldeRoessler wants to merge 1 commit into
MoltyCel:mainfrom
HaraldeRoessler:fix/async-web3-nonce-manager

Conversation

@HaraldeRoessler
Copy link
Copy Markdown
Contributor

Summary

  • Add app/nonce_manager.py with per-address asyncio.Lock preventing nonce collisions on concurrent transactions
  • Wrap all blocking Web3 RPC calls in asyncio.to_thread() across main.py, erc8004.py, and provenance/anchor.py
  • Convert post_reputation_feedback, register_onchain_agent, get_onchain_reputation from sync to async
  • Replace music VC anchoring via cast CLI subprocess with web3.py (eliminates private key in process env)

Problem

All Web3 calls were synchronous, blocking the FastAPI async event loop. This caused latency spikes for all concurrent API users whenever an on-chain operation ran. Additionally, concurrent transactions from the same wallet fetched the same nonce, causing one to fail silently.

How it works

Nonce manager: Uses an asyncio.Lock per wallet address. First call fetches the pending nonce from chain, subsequent calls increment locally. On failure, the cache resets to force a re-fetch.

Async wrapping: Every w3.eth.* call now runs in asyncio.to_thread() instead of blocking the event loop.

Test plan

  • Verify API starts without errors
  • Test /reputation/rate with ERC-8004 agent (triggers async post_reputation_feedback)
  • Test concurrent on-chain operations produce no nonce collisions
  • Verify music credential anchoring works without cast CLI dependency

Generated with Claude Code

All synchronous Web3 RPC calls (get_transaction_count, gas_price,
send_raw_transaction, wait_for_transaction_receipt, is_connected) were
blocking the FastAPI async event loop, causing latency spikes for all
concurrent users.

Changes:
- Add app/nonce_manager.py with per-address asyncio.Lock to prevent
  nonce collisions when multiple async handlers submit transactions
  concurrently from the same wallet
- Wrap all blocking Web3 calls in asyncio.to_thread() across main.py,
  erc8004.py, and provenance/anchor.py
- Convert erc8004.py functions (post_reputation_feedback,
  register_onchain_agent, get_onchain_reputation) from sync to async
  and update all call sites with await
- Replace music VC anchoring via cast CLI subprocess with web3.py,
  eliminating private key exposure in process environment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@HaraldeRoessler
Copy link
Copy Markdown
Contributor Author

Merge order

Independent — can merge in any order relative to #4 and #7. Recommended: after #1.

Sequence: #1#3#2#4#7

@HaraldeRoessler
Copy link
Copy Markdown
Contributor Author

Closing as superseded by #3, which contains the same app/main.py / app/erc8004.py / app/nonce_manager.py / app/provenance/anchor.py changes verbatim (identical hunks at the same line ranges), plus pytest unit tests, docker-compose.yml, and a smoke test. No content from this PR is missing from #3.

@HaraldeRoessler HaraldeRoessler deleted the fix/async-web3-nonce-manager branch May 12, 2026 00:08
HaraldeRoessler referenced this pull request in HaraldeRoessler/moltrust-api May 12, 2026
… CSP

Follow-up to the previous commit on this branch (89bec1d). Adds the 13
items reviewers flagged as still-open. Bundled into one branch since
they're all defence-in-depth / mechanical, and splitting would just
multiply review cycles.

FAIL-FAST ON REQUIRED CREDENTIALS (these are BREAKING for deploys
that haven't set the env vars):

  M2  NONCE_SECRET — empty default removed; raises at startup if unset
                     (the runtime checks at the call sites caught it
                     before, but better to refuse to start than to
                     serve traffic with trivially-forgeable nonces).
  M10 MOLTSTACK_DB_PW — empty default removed; raises at startup.
  HIGH#3 MOLTBOOK_APP_KEY — "moltdev_PENDING" sentinel default
                     removed; endpoint returns 503 when unset.
  L15 GITHUB_CLIENT_ID — "PENDING" sentinel removed; endpoint
                     returns 503 when unset.

OPERATOR MIGRATION (one-time before merging into prod):
  export MOLTRUST_ADMIN_USERS="..."         # from previous commit
  export NONCE_SECRET="..."                  # any high-entropy string
  export MOLTSTACK_DB_PW="..."               # the existing DB password
  # MOLTBOOK_APP_KEY / GITHUB_CLIENT_ID: optional, leave unset to disable

LOCALIZED HARDENING (no migration impact):

  M4  ipaddress.ip_address() validation on /admin/traffic/caller/{ip}.
      Rejects malformed input before the DB LIKE-prefix lookup.
  L3  random.choice annotations — noqa: S311 with "non-security
      content selection" justification at 4 sites (cosmetic uses;
      no security-adjacent sampling).
  M9  DATABASE_URL default cleaned of the $(cat /dev/null) shell
      antipattern. Was never shelled out, but confusing.
  HIGH#5 _reg_tracker registration-rate hash: 16-char (64-bit)
      truncation → full SHA-256. Per-API-key rate limits are no
      longer collision-bypassable.
  HIGH#6 scrub_secrets pattern expansion: GitHub tokens, Stripe
      live keys (sk_live_, rk_live_, whsec_), OpenAI/Anthropic
      sk-proj-*, generic JWTs, AWS secret access keys, plus more
      PRIVATE KEY header variants. Deliberately NOT adding broad
      hex / base58 patterns — they would scrub legitimate response
      payloads (Ethereum TX hashes, IPR commitment hashes, wallet
      addresses).
  M12 KMS signer plaintext-fallback gating: DID_PRIVATE_KEY_HEX
      and ~/.moltrust_did_private_key are accepted in dev, but
      a hard error is raised when MOLTRUST_ENV=production is set
      and no KMS-encrypted blob is provided.
  M13 update_last_seen / update_last_active: bare `except: pass`
      replaced with `logger.warning(...)`. Still fire-and-forget
      (doesn't block the request) but failures are observable now.
  M1  _get_client_ip: X-Real-IP / X-Forwarded-For are honoured ONLY
      when request.client.host falls in MOLTRUST_TRUSTED_PROXIES
      (default: RFC1918 + loopback + ULA, which covers a load
      balancer with a private IP). Operators with public-IP LBs
      should override the env var; set "0.0.0.0/0,::/0" to fall
      back to the previous "trust everyone" behaviour.
  L16 Swagger UI: CSP header added; CSS link pinned to @5.17.14
      with SRI integrity hash (matches the JS pin from the
      previous commit).

EXPLICITLY STILL OUT OF SCOPE (separate PRs / your call):

  HIGH#4 SESSIONS dict in app/admin_auth.py — needs a design
         decision (TTLCache vs background sweep vs Redis).
  M11    print() → logger across 15+ sites in app/main.py — too
         large to mix into this PR.
  L2     Foundry `subprocess` for on-chain anchoring → Python
         web3 lib — touches the core anchoring path; similar
         refactors closed before (PR #2).
  L1     subprocess for openssl SSL check in /health — low value.
  C1     .env.dilithium — VERIFIED never in git history, only on
         local disk. No repo change applicable; operator should
         delete the file and rotate the key locally.

VERIFICATION
  Python 3.12 AST parse — all 8 touched files compile cleanly.
  (System Python 3.9 can't parse the project — pre-existing
  f-string syntax requires 3.10+.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HaraldeRoessler referenced this pull request in HaraldeRoessler/moltrust-api May 12, 2026
Follow-up to commit d25e70c (SSRF). After running CodeQL default-setup
on the fork, 17 additional findings surfaced. Triage outcome:

  Already closed by earlier commits this PR:   1 (SSRF)
  False positives (dismissed via CodeQL UI):   4
  Real findings fixed in this commit:          5
  Stack-trace-exposure (deferred to design):   7

FIXES IN THIS COMMIT

  #1 [LOG SANITISATION] credit_middleware exception swallows DB password
     - app/main.py (logger.error in credit_middleware)
     `logger.error("…: %s", caller_did, e)` — the raw exception `e`
     can be an asyncpg ConnectionError whose repr() includes the
     Postgres connection string (with the password). Log only
     `type(e).__name__` instead.

  #2 [DEFENSIVE URL ENCODING] /join?ref= referrer parameter
     - app/main.py /join handler
     The redirect target is HARDCODED to https://moltrust.ch — the
     host is not user-controlled. But `f"https://moltrust.ch?ref={ref}"`
     interpolates `ref` raw, and a payload like `ref="x&malparam=…"`
     could corrupt the query string. Use `urllib.parse.quote(ref)` to
     percent-encode the value before interpolation.

  MoltyCel#3 [STDOUT TOKEN LEAK] telegram_hn_remind print(r.text)
     - scripts/telegram_hn_remind.py
     `print(f'Status: {r.status_code}, Response: {r.text}')` — if
     Telegram error responses ever echo the request URL (which contains
     the bot token in the path), the body lands in stdout / CI scrollback.
     Print only the status code.

  MoltyCel#4 [ReDoS] mpp authorization header regex
     - packages/mpp/index.js
     `auth.match(/^(?:Payment|MPP)\s+(.+)$/i)` on an unbounded header
     is polynomial-quadratic. This package is published to npm, so
     consumer servers carry the risk. Cap header at 8 KiB and use
     bounded `\s{1,8}` with a non-greedy first char.

  MoltyCel#5 [ReDoS] moltrust-openclaw-v2 base URL trim
     - moltrust-openclaw-v2/src/client.ts
     `.replace(/\/+$/, "")` is polynomial on pathological inputs.
     Replace with a `while (str.endsWith("/")) str = str.slice(0, -1)`
     loop, which is linear.

DISMISSED AS FALSE POSITIVES (no code change)

  MoltyCel#14 py/clear-text-logging-sensitive-data at SPIFFE bind log
      Logs spiffe_uri, did, caller_did — none are passwords. CodeQL
      misfires on the "did" → "id" → "password" name-similarity heuristic.

  MoltyCel#13, MoltyCel#12 py/clear-text-logging-sensitive-data in scripts/threadwatch.py
      Telegram bot token flows into the request URL but never into a
      logger or print() call — only to requests.post (which doesn't
      log URLs by default).

  MoltyCel#16 py/weak-sensitive-data-hashing in _reg_tracker
      This is in-memory rate-limit bucket-key derivation, not password
      storage. bcrypt/argon2 would be wrong here (slow + salted breaks
      the lookup). SHA-256 of the full API key is the correct primitive
      for an O(1) tracker.

EXPLICITLY DEFERRED (7 stack-trace-exposure findings)

  Multiple endpoints currently return `{"error": str(e)[:100]}` to
  callers. CodeQL flags these as info disclosure. Fixing them means
  changing the API contract — clients that parse the `error` field
  would break. This is a design call for the maintainer; deferring
  to a separate PR + discussion rather than including in this hardening
  pass.

VERIFICATION

  Python 3.12 AST parse — app/main.py + scripts/telegram_hn_remind.py
  compile cleanly. `node -c packages/mpp/index.js` clean. The TS file
  change is a syntactically-trivial loop, not type-impacting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HaraldeRoessler referenced this pull request in HaraldeRoessler/moltrust-api May 18, 2026
… CSP

Follow-up to the previous commit on this branch (89bec1d). Adds the 13
items reviewers flagged as still-open. Bundled into one branch since
they're all defence-in-depth / mechanical, and splitting would just
multiply review cycles.

FAIL-FAST ON REQUIRED CREDENTIALS (these are BREAKING for deploys
that haven't set the env vars):

  M2  NONCE_SECRET — empty default removed; raises at startup if unset
                     (the runtime checks at the call sites caught it
                     before, but better to refuse to start than to
                     serve traffic with trivially-forgeable nonces).
  M10 MOLTSTACK_DB_PW — empty default removed; raises at startup.
  HIGH#3 MOLTBOOK_APP_KEY — "moltdev_PENDING" sentinel default
                     removed; endpoint returns 503 when unset.
  L15 GITHUB_CLIENT_ID — "PENDING" sentinel removed; endpoint
                     returns 503 when unset.

OPERATOR MIGRATION (one-time before merging into prod):
  export MOLTRUST_ADMIN_USERS="..."         # from previous commit
  export NONCE_SECRET="..."                  # any high-entropy string
  export MOLTSTACK_DB_PW="..."               # the existing DB password
  # MOLTBOOK_APP_KEY / GITHUB_CLIENT_ID: optional, leave unset to disable

LOCALIZED HARDENING (no migration impact):

  M4  ipaddress.ip_address() validation on /admin/traffic/caller/{ip}.
      Rejects malformed input before the DB LIKE-prefix lookup.
  L3  random.choice annotations — noqa: S311 with "non-security
      content selection" justification at 4 sites (cosmetic uses;
      no security-adjacent sampling).
  M9  DATABASE_URL default cleaned of the $(cat /dev/null) shell
      antipattern. Was never shelled out, but confusing.
  HIGH#5 _reg_tracker registration-rate hash: 16-char (64-bit)
      truncation → full SHA-256. Per-API-key rate limits are no
      longer collision-bypassable.
  HIGH#6 scrub_secrets pattern expansion: GitHub tokens, Stripe
      live keys (sk_live_, rk_live_, whsec_), OpenAI/Anthropic
      sk-proj-*, generic JWTs, AWS secret access keys, plus more
      PRIVATE KEY header variants. Deliberately NOT adding broad
      hex / base58 patterns — they would scrub legitimate response
      payloads (Ethereum TX hashes, IPR commitment hashes, wallet
      addresses).
  M12 KMS signer plaintext-fallback gating: DID_PRIVATE_KEY_HEX
      and ~/.moltrust_did_private_key are accepted in dev, but
      a hard error is raised when MOLTRUST_ENV=production is set
      and no KMS-encrypted blob is provided.
  M13 update_last_seen / update_last_active: bare `except: pass`
      replaced with `logger.warning(...)`. Still fire-and-forget
      (doesn't block the request) but failures are observable now.
  M1  _get_client_ip: X-Real-IP / X-Forwarded-For are honoured ONLY
      when request.client.host falls in MOLTRUST_TRUSTED_PROXIES
      (default: RFC1918 + loopback + ULA, which covers a load
      balancer with a private IP). Operators with public-IP LBs
      should override the env var; set "0.0.0.0/0,::/0" to fall
      back to the previous "trust everyone" behaviour.
  L16 Swagger UI: CSP header added; CSS link pinned to @5.17.14
      with SRI integrity hash (matches the JS pin from the
      previous commit).

EXPLICITLY STILL OUT OF SCOPE (separate PRs / your call):

  HIGH#4 SESSIONS dict in app/admin_auth.py — needs a design
         decision (TTLCache vs background sweep vs Redis).
  M11    print() → logger across 15+ sites in app/main.py — too
         large to mix into this PR.
  L2     Foundry `subprocess` for on-chain anchoring → Python
         web3 lib — touches the core anchoring path; similar
         refactors closed before (PR #2).
  L1     subprocess for openssl SSL check in /health — low value.
  C1     .env.dilithium — VERIFIED never in git history, only on
         local disk. No repo change applicable; operator should
         delete the file and rotate the key locally.

VERIFICATION
  Python 3.12 AST parse — all 8 touched files compile cleanly.
  (System Python 3.9 can't parse the project — pre-existing
  f-string syntax requires 3.10+.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HaraldeRoessler referenced this pull request in HaraldeRoessler/moltrust-api May 18, 2026
Follow-up to commit d25e70c (SSRF). After running CodeQL default-setup
on the fork, 17 additional findings surfaced. Triage outcome:

  Already closed by earlier commits this PR:   1 (SSRF)
  False positives (dismissed via CodeQL UI):   4
  Real findings fixed in this commit:          5
  Stack-trace-exposure (deferred to design):   7

FIXES IN THIS COMMIT

  #1 [LOG SANITISATION] credit_middleware exception swallows DB password
     - app/main.py (logger.error in credit_middleware)
     `logger.error("…: %s", caller_did, e)` — the raw exception `e`
     can be an asyncpg ConnectionError whose repr() includes the
     Postgres connection string (with the password). Log only
     `type(e).__name__` instead.

  #2 [DEFENSIVE URL ENCODING] /join?ref= referrer parameter
     - app/main.py /join handler
     The redirect target is HARDCODED to https://moltrust.ch — the
     host is not user-controlled. But `f"https://moltrust.ch?ref={ref}"`
     interpolates `ref` raw, and a payload like `ref="x&malparam=…"`
     could corrupt the query string. Use `urllib.parse.quote(ref)` to
     percent-encode the value before interpolation.

  MoltyCel#3 [STDOUT TOKEN LEAK] telegram_hn_remind print(r.text)
     - scripts/telegram_hn_remind.py
     `print(f'Status: {r.status_code}, Response: {r.text}')` — if
     Telegram error responses ever echo the request URL (which contains
     the bot token in the path), the body lands in stdout / CI scrollback.
     Print only the status code.

  MoltyCel#4 [ReDoS] mpp authorization header regex
     - packages/mpp/index.js
     `auth.match(/^(?:Payment|MPP)\s+(.+)$/i)` on an unbounded header
     is polynomial-quadratic. This package is published to npm, so
     consumer servers carry the risk. Cap header at 8 KiB and use
     bounded `\s{1,8}` with a non-greedy first char.

  MoltyCel#5 [ReDoS] moltrust-openclaw-v2 base URL trim
     - moltrust-openclaw-v2/src/client.ts
     `.replace(/\/+$/, "")` is polynomial on pathological inputs.
     Replace with a `while (str.endsWith("/")) str = str.slice(0, -1)`
     loop, which is linear.

DISMISSED AS FALSE POSITIVES (no code change)

  MoltyCel#14 py/clear-text-logging-sensitive-data at SPIFFE bind log
      Logs spiffe_uri, did, caller_did — none are passwords. CodeQL
      misfires on the "did" → "id" → "password" name-similarity heuristic.

  MoltyCel#13, MoltyCel#12 py/clear-text-logging-sensitive-data in scripts/threadwatch.py
      Telegram bot token flows into the request URL but never into a
      logger or print() call — only to requests.post (which doesn't
      log URLs by default).

  MoltyCel#16 py/weak-sensitive-data-hashing in _reg_tracker
      This is in-memory rate-limit bucket-key derivation, not password
      storage. bcrypt/argon2 would be wrong here (slow + salted breaks
      the lookup). SHA-256 of the full API key is the correct primitive
      for an O(1) tracker.

EXPLICITLY DEFERRED (7 stack-trace-exposure findings)

  Multiple endpoints currently return `{"error": str(e)[:100]}` to
  callers. CodeQL flags these as info disclosure. Fixing them means
  changing the API contract — clients that parse the `error` field
  would break. This is a design call for the maintainer; deferring
  to a separate PR + discussion rather than including in this hardening
  pass.

VERIFICATION

  Python 3.12 AST parse — app/main.py + scripts/telegram_hn_remind.py
  compile cleanly. `node -c packages/mpp/index.js` clean. The TS file
  change is a syntactically-trivial loop, not type-impacting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant