staging <- dev by ducnmm · Pull Request #137 · MystenLabs/MemWal

ducnmm · 2026-05-07T08:34:53Z

staging <- dev

…NG-1014)

…cks)

…flake

Eng 1014/ci monorepo tests

…NG-1725)

fix(sdk): ENG-1725 `recallManual` broken after LOW-24

ENG-1409: Add benchmark scripts for Walrus, sidecar and recall latency

Handle delegate key limit in app

…nd-bulk-remember ENG-1406 + ENG-1408: Async remember pipeline and bulk remember

…NG-1405)

…ization ENG-1405: Optimize recall with LRU blob cache and batched SEAL decrypt

Lifts the practical /api/remember ceiling from ~8 KiB (text-embedding-3-small context window) to 1 MiB by summarizing the plaintext via gpt-4o-mini before embedding. The full original text is still SEAL-encrypted and stored on Walrus — only the embedding input is summarized, so recall returns the unmodified plaintext. How it works ------------ text ≤ 8 KiB → embed(text) || encrypt(text) [unchanged] text > 8 KiB → summarize(text) → embed(summary) || encrypt(text) Map-reduce summarization handles arbitrarily large input within a single embedder call: 1. split into ≤ 64 KiB chunks 2. summarize each chunk in parallel (gpt-4o-mini, bounded concurrency) 3. reduce the chunk summaries into one final summary (≤ ~500 words) 4. embed the final summary Falls back to the direct embed path when OPENAI_API_KEY is unset (mock / dev mode), so this doesn't introduce a hard dependency on OpenAI. Boundary handling ----------------- * MAX_REMEMBER_TEXT_BYTES = 1 MiB enforced inside the remember handler. * MAX_ANALYZE_TEXT_BYTES = 64 KiB enforced inside the analyze handler (analyze does a single LLM call with no chunking, so it has a tighter cap independent of remember). * Auth middleware caps protected JSON bodies at 2 MiB — PROTECTED_BODY_LIMIT_BYTES — covering both single 1 MiB remember payloads and bulk-remember batches. * Sidecar /seal/encrypt cap raised to 2 MiB to accept the SEAL request for the full original. The previous global app.use(json({limit:256kb})) in scripts/sidecar-server.ts was masking per-route overrides on /seal/decrypt-batch and /walrus/upload — those are now per-route only, using named JSON_LIMIT_* constants. Integration with PR #121 (async remember) ------------------------------------------ After the rebase onto dev, /api/remember is async (returns 202 + job_id). Summarization runs inside spawn_prepare_remember_job and the bulk variant spawn_prepare_bulk_remember_job, before the embed/encrypt fan-out. The encrypt fork still uses the original text bytes — only the embedding input is summarized. Tests ----- * services/server/tests/e2e_test.py — adds three parametric size cases: 64 KiB (asserts 200 + summarize log), 512 KiB (asserts 200), and MAX_REMEMBER_TEXT_BYTES + 1 (asserts 400). Mirrors the Rust constant to catch drift. * 135/135 unit tests pass, including new assertions on MAX_ANALYZE_TEXT_BYTES, PROTECTED_BODY_LIMIT_BYTES, and the bench- bypass default. Benchmark harness ----------------- services/server/scripts/bench-remember-sizes.ts drives 14 hand-curated fixtures (Wikipedia prose, Project Gutenberg / Journey to the West for CJK, science-dense prose, structured JSON, mixed markdown+code) at sizes 4 KiB → 1 MiB through the full async lifecycle (POST → poll → recall), asserting 202 / job done / 400 boundary as appropriate. Bench results against testnet (RATE_LIMIT_DISABLED=1): Size Worker Recall Status 4 KiB ~25 s ~3 s done 64 KiB ~22-26 s ~1-3 s done 256 KiB ~36-42 s ~1-3 s done 512 KiB ~42 s ~1 s done 1 MiB ~60 s ~3 s done 1 MiB+1 — — 400 ✅ 14/14 fixtures pass end-to-end. Benchmark-only escape hatch --------------------------- The async remember pipeline turns one user request into 1 POST + N status polls + 1 recall, which exceeds the 30-weighted-req/min per-delegate-key budget on the second fixture. Adds RATE_LIMIT_DISABLED=1 (default off, asserted in tests, loud tracing::warn at startup, surfaced in /config) that skips request-rate buckets only — storage quota and auth still apply. Intended for localhost benchmarks. Files changed ------------- services/server/src/routes.rs — summarize_for_embedding + map-reduce, MAX_ANALYZE_TEXT_BYTES guard, summarize wired into async/bulk paths services/server/src/auth.rs — PROTECTED_BODY_LIMIT_BYTES = 2 MiB services/server/src/main.rs — DefaultBodyLimit + bench-bypass startup warn services/server/src/rate_limit.rs — bench_bypass_enabled flag + bypass services/server/src/types.rs — ConfigResponse.rate_limit_disabled services/server/scripts/sidecar-server.ts — per-route json() limits with named constants services/server/scripts/bench-remember-sizes.ts — async-aware harness with poll loop services/server/scripts/bench-fixtures.json — 14 hand-curated realistic fixtures services/server/tests/e2e_test.py — parametric size cases Co-authored-by: ducnmm <mauduckiengiang@gmail.com>

chore(sdk): bump version to 0.0.3

hien-p and others added 22 commits May 4, 2026 11:33

ci(chatbot): add Playwright E2E workflow with deterministic setup (E…

1a87653

…NG-1014)

ci: complete ENG-1014 monorepo test workflow (server E2E + move + che…

39a02c7

…cks)

ci: fix server-e2e sidecar deps + move-contract sui install

6656eb0

ci(server-e2e): set dummy SIDECAR_AUTH_TOKEN so the server boots

555d708

ci(chatbot): warm route + bump navigationTimeout to fix cold-compile …

eb66686

…flake

ci: restrict workflow triggers to main branches

ec46588

[QA] Add automated tests to CI (unit + integration + e2e) for MemWal

a31366c

fix(server-e2e): include x-nonce + 6-field signature payload

85e6b37

Merge pull request #132 from MystenLabs/eng-1014/ci-monorepo-tests

ff92d6b

Eng 1014/ci monorepo tests

fix(sdk): scope SEAL id with caller suffix so seal_approve passes (E…

9fd9d98

…NG-1725)

Merge pull request #133 from MystenLabs/fix/eng-1725-recall-manual

3078e77

fix(sdk): ENG-1725 `recallManual` broken after LOW-24

feat: add memory API benchmark CI

5ed6c92

Merge pull request #119 from MystenLabs/enhance/ENG-1409-benchmarks

769e35b

ENG-1409: Add benchmark scripts for Walrus, sidecar and recall latency

feat: add async remember pipeline and bulk remember

77e01d0

fix(app): handle delegate key limit

ad5eced

Merge pull request #135 from MystenLabs/fix/max-delegate-keys

9d0828b

Handle delegate key limit in app

feat: async remember jobs

f14cb4d

Merge pull request #121 from MystenLabs/enhance/ENG-1406-1408-async-a…

d27c6d1

…nd-bulk-remember ENG-1406 + ENG-1408: Async remember pipeline and bulk remember

feat: optimize recall with LRU blob cache and batched SEAL decrypt (E…

728153b

…NG-1405)

Merge pull request #120 from MystenLabs/enhance/ENG-1405-recall-optim…

901acbd

…ization ENG-1405: Optimize recall with LRU blob cache and batched SEAL decrypt

chore(sdk): bump version to 0.0.3

7a073b8

hungtranphamminh self-requested a review May 7, 2026 09:05

hungtranphamminh approved these changes May 7, 2026

View reviewed changes

Merge pull request #138 from MystenLabs/chore/sdk-0.0.3-release

3f20012

chore(sdk): bump version to 0.0.3

ducnmm temporarily deployed to benchmark-dev May 7, 2026 09:18 — with GitHub Actions Inactive

railway-app Bot temporarily deployed to MemWal / dev May 7, 2026 09:18 Inactive

ducnmm merged commit 1fc103a into staging May 7, 2026
21 of 22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

staging <- dev#137

staging <- dev#137
ducnmm merged 23 commits intostagingfrom
dev

ducnmm commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ducnmm commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants