ENG-1407: Summarize large text before embedding in /api/remember by ducnmm · Pull Request #122 · MystenLabs/MemWal

ducnmm · 2026-04-27T00:42:07Z

Summary

When text > 8 KiB, server calls gpt-4o-mini to summarize before embedding — the full original is still encrypted and stored on Walrus
Raises MAX_REMEMBER_TEXT_BYTES to 1 MiB (from no explicit limit) and adds DefaultBodyLimit of 1.5 MiB on protected routes
Skips summarization gracefully when OPENAI_API_KEY is not set (dev/mock mode)
Includes a benchmark script (scripts/bench-remember-sizes.ts) for latency testing across payload sizes

How it works

text ≤ 8 KiB  →  embed(text) + encrypt(text)           // existing path
text > 8 KiB  →  summarize(text) → embed(summary) + encrypt(text)  // new path

Recall finds the memory via the summary's embedding, then returns the decrypted full original text.

Latency analysis

Before (main): text > 8 KiB was rejected or sent full text to embedding API. The text-embedding-3-small model has an ~8K token limit, so large payloads would either fail or produce degraded embeddings from truncation.

After (this PR): Large text is summarized to ~500 words via gpt-4o-mini first, then the summary is embedded. This adds one LLM call (~1-2s for gpt-4o-mini) but:

Prevents embedding API token limit errors for payloads up to 1 MiB
Produces better semantic embeddings (summary is optimized for retrievability)
The summarization + embed pipeline runs sequentially, while encrypt runs in parallel with embed

Size	Before	After	Notes
≤ 8 KiB	~2-3s (embed + encrypt)	~2-3s (same path)	No change
16 KiB	❌ rejected / truncated	~3-5s	+1-2s for summarization
64 KiB	❌ rejected	~3-5s	Same — summary is always ~500 words
1 MiB	❌ rejected	~4-6s	Slightly longer summarization

Estimated latencies. Use scripts/bench-remember-sizes.ts for live benchmarking.

Benchmark script

BENCH_PRIVATE_KEY=<ed25519-hex> BENCH_ACCOUNT_ID=<account-object-id> \
  pnpm exec tsx scripts/bench-remember-sizes.ts [server_url]

Files changed

services/server/src/routes.rs — summarize_for_embedding(), constants, remember handler branching
services/server/src/main.rs — DefaultBodyLimit::max(1.5 MiB) on protected routes
scripts/bench-remember-sizes.ts — Benchmark script for latency testing (NEW)

Test plan

cargo check passes (0 errors, 0 warnings)
Text < 8 KiB takes the direct embed path (no summarization)
Text > 8 KiB triggers summarization, logs show byte reduction
Full original text is stored on Walrus (not the summary)
Works in mock mode (no OPENAI_API_KEY) — summarization skipped
Run scripts/bench-remember-sizes.ts on staging to validate latency numbers

🤖 Generated with Claude Code

hungtranphamminh

Tested locally against testnet (PR vs dev worktrees, shared infra, registered testnet delegate key). The summarize path itself works — at 64 KiB, dev errors on the embedding API token limit while this PR succeeds. Good.

But the 1 MiB ceiling in the description is not reachable with the changes in this PR — two independent caps the patch doesn't touch — and /api/analyze picks up an unintended size regression.

Blocking

🔴 1. Auth middleware caps body at 1 MiB — `services/server/src/auth.rs:165`

let body_bytes = axum::body::to_bytes(body, 1024 * 1024)
    .await
    .map_err(|_| StatusCode::BAD_REQUEST)?;

Auth runs before the handler, buffering the body for the SHA-256 sig check. This 1 MiB cap is independent of DefaultBodyLimit::max(1536 * 1024). Any body > 1 MiB → 400 with empty body, before MAX_REMEMBER_TEXT_BYTES is ever checked. With the JSON envelope overhead (~30 B), the largest plaintext that can pass auth is ~1,048,540 B.

Confirmed by adding a temporary tracing::warn! on the to_bytes error path and reproducing at the boundary:

body=1 MiB     → "auth.rs body buffered: 1048576 bytes" → handler runs (then 500 from sidecar — see blocker 2 below)
body=1 MiB+1   → "auth.rs to_bytes(1 MiB) failed: length limit exceeded" → 400

Fix: raise the cap to match DefaultBodyLimit, ideally factored into a single const shared between main.rs and auth.rs so they can't drift again:

// shared, e.g. in main.rs or a new constants module
pub const MAX_AUTHENTICATED_BODY_BYTES: usize = 1536 * 1024;

// auth.rs:165
let body_bytes = axum::body::to_bytes(body, MAX_AUTHENTICATED_BODY_BYTES)
    .await
    .map_err(|_| StatusCode::BAD_REQUEST)?;

// main.rs:169
.layer(DefaultBodyLimit::max(MAX_AUTHENTICATED_BODY_BYTES));

🔴 2. Sidecar caps `/seal/encrypt` at 256 KB — `services/server/scripts/sidecar-server.ts:298`

app.use(express.json({ limit: "256kb" }));

The PR raises MAX_REMEMBER_TEXT_BYTES and DefaultBodyLimit but doesn't update the sidecar. Anything 256 KB–1 MiB hits the handler, summarizes successfully, then 500s from /seal/encrypt — and the sidecar's HTML stack-trace gets concatenated into tracing::error! (visible in logs). The sidecar already has per-route overrides for /seal/decrypt-batch (8mb, line 490) and /walrus/upload (10mb, line 595); /seal/encrypt (line 343) needs the same.

app.post("/seal/encrypt", express.json({ limit: "2mb" }), async (req, res) => { ... });

🔴 3. `/api/analyze` is now unguarded at 1.5 MiB — `services/server/src/routes.rs`

MAX_REMEMBER_TEXT_BYTES was only ever enforced inside remember(); analyze() had no per-handler size check and relied on the route layer's old 256 KiB cap. Raising that to 1.5 MiB lets a hostile client feed ~1 MiB of text into extract_facts_llm (one gpt-4o-mini call, no upstream guard) for the same rate-limit weight (5) as a tiny request.

(No existing-data concern — the previous 256 KiB body cap meant nothing larger ever made it past the wire, so there's no migration to worry about.)

Fix: add an explicit per-handler check (analyze doesn't benefit from being any larger than remember's old ceiling):

// routes.rs
const MAX_ANALYZE_TEXT_BYTES: usize = 64 * 1024;

// at the top of analyze() handler, mirroring remember():
if body.text.is_empty() {
    return Err(AppError::BadRequest("Text cannot be empty".into()));
}
if body.text.len() > MAX_ANALYZE_TEXT_BYTES {
    return Err(AppError::BadRequest(format!(
        "Text exceeds maximum length of {} bytes",
        MAX_ANALYZE_TEXT_BYTES
    )));
}

Test results (testnet, PR vs dev)

Size	dev	PR
4 KiB	✅ 200	✅ 200
9 KiB	✅ 200 (no summary)	✅ 200 (summarized 9216→316 B)
64 KiB	❌ 500 (embed token-limit)	✅ 200 (summarized 65536→394 B)
256 KiB	❌ 413 (axum body)	❌ 500 (sidecar 256 KB)
1 MiB	❌ 400 (auth)	❌ 400 (auth)

Non-blocking

🟡 Latency table is significantly off on testnet — PR claims 2–6s; I measured 18–22s for every successful size, with ~17s of that being Walrus testnet upload. Mainnet Walrus may be much faster, so the table might still hold there, but it's worth either re-measuring on mainnet or marking the numbers as estimates so reviewers know the source.
🟡 scripts/bench-remember-sizes.ts is missing from the diff despite being mentioned in the description and the test plan. Either commit it or trim the description.
🟡 CI didn't catch any of this because the e2e suite only exercises tiny payloads (tests/e2e_test.py uses "hello", "evil", "The capital of France is Paris." — all < 50 B). Worth adding two parametric cases: one at 64 KiB asserting 200 + summarize log, and one at MAX_REMEMBER_TEXT_BYTES + 1 asserting 400. Then any future limit drift fails CI.
🟡 Audit-trail comments were removed (LOW-6 on routes.rs:19,146, HIGH-13 on main.rs:150). The dev branch is heavily annotated with security-audit markers; one short replacement justifying the new ceilings would keep the trail intact (e.g. // summarization keeps embedding input under text-embedding-3-small's 8k token limit; sidecar /seal/encrypt cap must match).

Suggested checklist before merge

Raise auth.rs:165 to_bytes cap to 1536 * 1024 (or factor into a const shared with main.rs)
Raise sidecar /seal/encrypt body limit
Add explicit size guard on /api/analyze
Empirically verify 1 MiB end-to-end after the two fixes (or lower MAX_REMEMBER_TEXT_BYTES to a verified number)
Add parametric size cases to e2e (200 at 64 KiB, 400 at MAX + 1)
Update or remove the latency table; commit the bench script if you'd like

ducnmm · 2026-05-05T15:11:03Z

Hi @hungtranphamminh please fix it and test carefully in local

hungtranphamminh · 2026-05-06T12:31:18Z

Tested 6cde4da locally against testnet — your map-reduce summarization is working as designed. 1 MiB plaintext now chunks into 16 × 64 KiB, summarizes them in parallel, reduces in 1 round to a 399-byte embedding input. The LLM context limit we were about to flag is no longer reachable. Nice fix.

Committed scripts/bench-remember-sizes.ts (6b27cdf) so the size cases are reproducible. Sweeps 4 KiB → 1 MiB + boundary-bracketing sizes (256/384/480/512/768 KiB), recalls each on a unique marker to confirm the original round-trips. Run it with:

BENCH_DELEGATE_KEY=<hex> BENCH_ACCOUNT_ID=0x... \
  bun run services/server/scripts/bench-remember-sizes.ts

Results against `6cde4da`

Size	Status	Notes
4 KiB	✅ 200	direct embed (no summarize)
9 KiB	✅ 200	single-call summarize
64 KiB	✅ 200	single-call summarize
256 KiB	❌ 500	map-reduce works, sidecar `/seal/encrypt` rejects
384 KiB	❌ 500	6-chunk summarize succeeds, sidecar rejects
480 KiB	❌ 500	8-chunk summarize succeeds, sidecar rejects
512 KiB	❌ 500	sidecar
768 KiB	❌ 500	sidecar
1 MiB	❌ 500	16-chunk summarize → 399 B summary ✅, sidecar rejects
1 MiB + 1	✅ 400	route-layer reject (correct)

One remaining 🔴 blocker — sidecar `/seal/encrypt` 256 KB cap

Same finding as before. From server logs at the 1 MiB run:

remember: text="..." len=1048576 summarize=true
  -> summarizing 1048576 bytes in 16 chunks of up to 65536 bytes
  -> reducing 16 summaries in 1 batches (round 1)
  → summarized 1048576 bytes → 399 bytes for embedding
ERROR ... seal encrypt failed: <!DOCTYPE html>
<body><pre>PayloadTooLargeError: request entity too large
  at readStream (.../node_modules/raw-body/index.js:163:17)
  at jsonParser (.../node_modules/body-parser/lib/types/json.js:88:5)
  ...

scripts/sidecar-server.ts:298 has a global app.use(express.json({ limit: "256kb" })) that runs before any per-route json() and rejects oversize bodies first. Per-route overrides on /seal/decrypt-batch (8mb, line 490) and /walrus/upload (10mb, line 595) are dead code for the same reason — they've never actually taken effect. Suggested fix is to drop the global and apply explicit per-route json() everywhere:

// /seal/encrypt receives the full plaintext (up to PROTECTED_BODY_LIMIT_BYTES)
app.post("/seal/encrypt", express.json({ limit: "2mb" }), async (req, res) => { ... });

This is the only thing standing between this PR and a verified 1 MiB end-to-end round-trip.

Other items from the prior review still open

🔴 /api/analyze is now unguarded at 1.5 MiB — MAX_REMEMBER_TEXT_BYTES was only ever enforced inside remember(). Raising the route layer to 1.5 MiB lets a hostile client feed ~1 MiB of text into extract_facts_llm (single LLM call, no chunking) for the same rate-limit weight as a tiny request. Suggested per-handler check (mirror remember()):
```
const MAX_ANALYZE_TEXT_BYTES: usize = 64 * 1024;
if body.text.len() > MAX_ANALYZE_TEXT_BYTES {
    return Err(AppError::BadRequest(format!(
        "Text exceeds maximum length of {} bytes",
        MAX_ANALYZE_TEXT_BYTES
    )));
}
```
🟡 Audit-trail comments (LOW-6 on routes.rs:19,146, HIGH-13 on main.rs) — still removed without replacement. One short comment justifying the new ceilings would keep the trail intact.
🟡 Parametric e2e tests — tests/e2e_test.py still only exercises ≤ 50 B payloads. Two cases would catch future drift: one at 64 KiB asserting 200 + summarize log, one at MAX + 1 asserting 400.

Latency snapshot

For the cases that succeed, ~17 s of each request is Walrus testnet upload — so the absolute numbers are dominated by infra rather than the new path. The summarize phase itself was ~16 s for the 16-chunk parallel pass on 1 MiB and ~1.7 s for the reduce. The PR description's 2–6 s table will look very different on mainnet; worth either re-measuring there or marking as estimates.

Next steps

I'll take it from here — pushing the sidecar + /api/analyze fixes on top of 6cde4da, re-running the bench to confirm 1 MiB end-to-end, and re-requesting review once green.

…(ENG-1407)

The PR description referenced this script for validating /api/remember across payload sizes; it was missing from the diff. Adds it now so the size boundary cases are reproducible. Sweeps prose-like payloads from 4 KiB to 1 MiB (boundary-bracketing sizes at 256/384/480/512/768 KiB), signing each remember with a delegate key and recalling on a unique marker to confirm the original full text round-trips (not the summary). Each case asserts an expected HTTP status; any deviation points at where the chain (auth cap, route layer, summarize, sidecar, SEAL, Walrus) changed. Usage: BENCH_DELEGATE_KEY=<hex> BENCH_ACCOUNT_ID=0x... \ bun run scripts/bench-remember-sizes.ts

Three follow-ups to 6cde4da based on local boundary-probe testing: - sidecar /seal/encrypt was capped at 256 KiB by a global app.use(json()) that runs before any per-route json() middleware. Express middleware fires in declaration order — the global parser consumed the body and rejected oversize payloads first, leaving the per-route limits on /seal/decrypt-batch (8mb) and /walrus/upload (10mb) as dead code. Replaced the global with explicit per-route json() on every endpoint, with named limit constants. /seal/encrypt now accepts up to 2 MiB, unblocking the 1 MiB plaintext path end-to-end. - /api/analyze had no per-handler size guard and relied on the route layer's old 256 KiB DefaultBodyLimit. Raising that to 1.5 MiB silently widened analyze's surface to ~1 MiB of LLM input for the same rate-limit weight as a tiny request. Added an explicit check at the top of analyze() with MAX_ANALYZE_TEXT_BYTES = 64 KiB so a hostile client cannot burn 1 MiB of LLM tokens for the price of a small request. Analyze does fact extraction in a single LLM call without chunking, so it does not need the larger remember ceiling. - Restored the LOW-6 / HIGH-13 audit-trail comments that were removed in the original PR. These document why the body-cap layers were chosen and reference the security-audit work tracked under those markers across dev. Updated to reflect the new ceilings.

The existing e2e suite only exercised tiny payloads (<50 B); none of the size limits affected by ENG-1407 were under coverage. Adds three opt-in parametric checks (gated on TEST_DELEGATE_KEY like the rest of the happy-path suite): - size_64kb_summarized: 64 KiB plaintext succeeds via the new summarize-before-embed path. On the pre-PR baseline this size errors out at the embedding API token limit; this is the regression test that proves the path works end-to-end against real Walrus + SEAL. - size_large_accepted: 512 KiB plaintext succeeds end-to-end. Exercises the chunked map-reduce summarize path (8 chunks at this size), the auth body cap, the sidecar /seal/encrypt body limit, SEAL encrypt on the full plaintext, and the Walrus upload. Smaller than MAX_REMEMBER_TEXT_BYTES intentionally — at 1 MiB the path fans out to 16 parallel LLM calls, which multiplies any upstream flake into a per-request failure. - size_over_limit_rejected: MAX_REMEMBER_TEXT_BYTES + 1 returns 400 with the handler-level "exceeds maximum length" message — confirming the rejection comes from the size guard, not from upstream auth or body-limit middleware (which would return an empty 400).

Refactor the harness to consume bench-fixtures.json (14 fixtures, 5 MB) spanning realistic content categories rather than hardcoded synthetic strings. Categories cover the size x tokenization-density curve that ENG-1407's chunked summarization path actually has to handle: - prose-english Wikipedia articles (CC-BY-SA-4.0), 64 KiB to 1 MiB - science-dense Tokenization-dense math/physics prose - cjk-classic Journey to the West (Public Domain, ~1.5 chars/token) - structured Synthetic JSON arrays - mixed Markdown + code + JSON - boundary 1 MiB + 1 byte (must reject with 400) Each fixture declares expect_status; the harness asserts the HTTP response matches and follows successful remembers with a single /api/recall sanity hit (recall *quality* is intentionally out of scope -- that's deferred to Locomo / longmemeval). A --fixtures flag lets the same harness consume alternate fixture sets without code changes.

Adds an opt-in flag (default: off) that skips the request-rate buckets in both the protected-route middleware and the public sponsor-route middleware. Storage quota and auth still apply. Why this exists: as of ENG-1406 v3 the /api/remember pipeline is async, so a single user remember translates to one POST + N status polls + one recall. Even modest benchmarks blow past the 30 weighted-req/min per-delegate-key budget on the second fixture. Without an escape hatch, end-to-end ingestion benches can't run at all. Guardrails: * Off unless RATE_LIMIT_DISABLED=1 (or 'true', case-insensitive) * Default impl forces bench_bypass_enabled=false, asserted in tests * Loud tracing::warn! at every server start when the flag is on * Storage quota (1 GB/user) is NOT bypassed — only request-rate * Surfaced in /config as rateLimitDisabled so benchmarks can pre-flight check before running This is intended only for localhost benchmark runs; production deployments must leave the env var unset.

PR #121 turned /api/remember into a 202-Accepted async pipeline; the bench was still asserting 200 and reading {id} from the response, so every fixture would have failed against rebased dev. Refactors the harness around the new lifecycle: POST /api/remember -> expect 202 with { job_id } GET /api/remember/:job_id -> poll until done | failed (500 ms cadence, 120 s timeout) POST /api/recall -> sanity hit on the read path Three timings are tracked instead of one: enqueueMs (request -> 202), workerMs (202 -> done, the real ENG-1407 work), and recallMs. Pass criteria tightened: a fixture only counts as passing when the HTTP status matches AND the worker reaches 'done' for 202 cases. Worker 'failed' or 'timeout' now show up as failures even though the request itself returned 202. Adds a /config pre-flight check at boot: aborts immediately with a clear message if the server doesn't have rateLimitDisabled set, instead of silently 429-ing partway through a 10-minute run. Drops the artificial inter-fixture sleep that was a workaround for the rate limiter.

13 success-path fixtures had expect_status: 200; the new async /api/remember returns 202. Fixtures themselves are unchanged — same content, same byte sizes, same source attribution. Boundary case (expect_status: 400) is unaffected since size validation runs synchronously before enqueue. The file also picks up a formatting change (single-line-per-fixture -> indented JSON) from a Python rewrite during the rebase work; semantic content is byte-identical to before.

hungtranphamminh · 2026-05-07T06:01:10Z

Pushed the remaining work and rebased onto latest dev. Branch is now at d642c83 — 9 commits over dev. Worth flagging upfront: SHAs of your earlier commits changed during the rebase (1a290fb → 8a7b452, 6cde4da → e75927f); the message bodies are unchanged.

What I added on top of your work

Audit-trail + handler-side guards (5707474) — the items still open from comment 2:

MAX_ANALYZE_TEXT_BYTES = 64 * 1024 enforced inside analyze() so the unguarded 1.5 MiB → ~1 MiB → single-LLM-call vector is closed.
Restored the LOW-6 / HIGH-13 audit comments (routes.rs, main.rs) explaining the size ceilings.
Sidecar app.use(express.json({ limit: "256kb" })) removed; per-route json() applied everywhere with named JSON_LIMIT_* constants. /seal/encrypt now actually accepts the 2 MiB it advertised.

Parametric e2e tests (ed91a6d) — tests/e2e_test.py now has three size cases: 64 KiB (asserts 200 + summarize path), 512 KiB (asserts 200 — picked under 1 MiB to avoid OpenRouter response flake on the largest fixture in CI), and over-limit (asserts 400). Mirrors MAX_REMEMBER_TEXT_BYTES from Rust to catch drift.

Realistic bench fixtures (d31bc87) — replaced the synthetic strings with a hand-curated bench-fixtures.json covering 5 content categories (Wikipedia prose, Project Gutenberg / Journey to the West for CJK, science-dense prose, structured JSON, mixed markdown+code) at sizes 4 KiB → 1 MiB. Idea is the size × tokenization-density curve — CJK at ~1.5 chars/token vs ASCII at ~4 tells you very different things at 256 KiB.

Bench harness adapted to async remember (0a2ed98, d642c83) — /api/remember returns 202 now, so the harness:

POSTs and asserts 202,
polls GET /api/remember/:job_id every 500 ms until done / failed (120 s cap),
recalls once as a read-path sanity check.

Tracks three timings: enqueueMs, workerMs (the meaningful ENG-1407 number), recallMs. Pass criteria require both HTTP 202 and worker reaching done.

Rate-limiter escape hatch (b55aaaf) — RATE_LIMIT_DISABLED=1 skips the request-rate buckets only (storage quota stays enforced). Background: with the new async pipeline, one user remember = 1 POST + N status polls + 1 recall, which exceeds 30 weighted req/min on the second fixture. Without an escape hatch, end-to-end ingestion benches can't run at all. Off by default, asserted false in unit tests, loud tracing::warn! at startup when on, surfaced in /config so the bench can pre-flight check.

Latest bench results

Server started with RATE_LIMIT_DISABLED=1, run against d642c83 (post-rebase, includes #120 + #121):

Fixture	Size	Worker time	Recall
prose-en-4kb	4 KiB	24.9 s	200 (3.6 s)
prose-en-64kb	64 KiB	25.5 s	200 (1.0 s)
prose-en-256kb	256 KiB	42.5 s	200 (2.5 s)
prose-en-1mib	1 MiB	60.4 s	200 (3.6 s)
dense-sci-64kb	64 KiB	22.9 s	200 (0.9 s)
dense-sci-256kb	256 KiB	35.9 s	200 (2.9 s)
dense-sci-1mib	1 MiB	61.6 s	200 (2.7 s)
cjk-zh-64kb	64 KiB	21.0 s	200 (2.7 s)
cjk-zh-256kb	256 KiB	37.1 s	200 (1.1 s)
json-64kb	64 KiB	24.0 s	200 (2.7 s)
json-256kb	256 KiB	36.7 s	200 (2.8 s)
json-512kb	512 KiB	41.9 s	200 (1.1 s)
mixed-256kb	256 KiB	38.0 s	200 (1.0 s)
boundary-over-max	1 MiB + 1	—	rejected 400 ✅

14/14 passed. No worker failures, no recall failures. The 1 MiB cases land at ~60 s (well under the 120 s timeout). Worker-time scales roughly linearly with size up to the chunk-count threshold, then map-reduce dominates.

Conflict resolutions during rebase

Two rounds of rebase against dev along the way (PR #120 + #121 both landed under us). Resolutions were all clean unions — both feature sets coexist without behavioral conflict:

routes.rs — kept your summarize_for_embedding / summarize_with_prompt map-reduce stack alongside ENG-1405: Optimize recall with LRU blob cache and batched SEAL decrypt #120's recall_embedding_cache_key / generate_recall_embedding_cached recall-side embedding cache.
auth.rs — PROTECTED_BODY_LIMIT_BYTES raised from 1.5 MiB → 2 MiB so the auth middleware doesn't reject bulk-remember requests (which use a 2 MiB per-route cap from ENG-1406 + ENG-1408: Async remember pipeline and bulk remember #121).
main.rs — audit comment updated to reflect the new 2 MiB ceiling rationale (single 1 MiB + bulk ~1.5 MB).

ENG-1407 summarization wired into both async paths from #121:

spawn_prepare_remember_job calls summarize_for_embedding before the embed/encrypt fan-out for texts > 8 KiB.
spawn_prepare_bulk_remember_job does the same per item (bulk allows up to MAX_REMEMBER_TEXT_BYTES per item, so the rule applies there too).

Verification

cargo test --bin memwal-server — 135/135 unit tests pass
cargo check --tests — clean
14/14 fixtures pass the bench end-to-end against the rebased server
/config returns rateLimitDisabled: true only when the env var is set

I think this is ready for re-review. I'm leaving the formal review state as-is rather than self-dismissing — happy to walk through any of the above on request.

ducnmm force-pushed the enhance/ENG-1407-summarize-large-remember branch from fdf2783 to 57145c8 Compare April 27, 2026 03:13

ducnmm force-pushed the enhance/ENG-1407-summarize-large-remember branch 3 times, most recently from 3d30a3f to 1a290fb Compare May 5, 2026 03:51

hungtranphamminh requested review from hungtranphamminh May 5, 2026 14:35

hungtranphamminh requested changes May 5, 2026

View reviewed changes

hungtranphamminh force-pushed the enhance/ENG-1407-summarize-large-remember branch from c344768 to b5b07e4 Compare May 7, 2026 05:12

ducnmm and others added 9 commits May 7, 2026 12:14

feat(server): summarize large text before embedding in /api/remember …

8a7b452

…(ENG-1407)

fix(server): bound large remember summarization requests

e75927f

hungtranphamminh force-pushed the enhance/ENG-1407-summarize-large-remember branch from b5b07e4 to d642c83 Compare May 7, 2026 05:49

hungtranphamminh approved these changes May 7, 2026

View reviewed changes

hungtranphamminh merged commit 3b5a7a8 into dev May 7, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENG-1407: Summarize large text before embedding in /api/remember#122

ENG-1407: Summarize large text before embedding in /api/remember#122
hungtranphamminh merged 9 commits intodevfrom
enhance/ENG-1407-summarize-large-remember

ducnmm commented Apr 27, 2026 •

edited

Loading

Uh oh!

hungtranphamminh left a comment

Uh oh!

ducnmm commented May 5, 2026

Uh oh!

hungtranphamminh commented May 6, 2026

Uh oh!

hungtranphamminh commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ducnmm commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

Latency analysis

Benchmark script

Files changed

Test plan

Uh oh!

hungtranphamminh left a comment

Choose a reason for hiding this comment

Blocking

🔴 1. Auth middleware caps body at 1 MiB — services/server/src/auth.rs:165

🔴 2. Sidecar caps /seal/encrypt at 256 KB — services/server/scripts/sidecar-server.ts:298

🔴 3. /api/analyze is now unguarded at 1.5 MiB — services/server/src/routes.rs

Test results (testnet, PR vs dev)

Non-blocking

Suggested checklist before merge

Uh oh!

ducnmm commented May 5, 2026

Uh oh!

hungtranphamminh commented May 6, 2026

Results against 6cde4da

One remaining 🔴 blocker — sidecar /seal/encrypt 256 KB cap

Other items from the prior review still open

Latency snapshot

Next steps

Uh oh!

hungtranphamminh commented May 7, 2026

What I added on top of your work

Latest bench results

Conflict resolutions during rebase

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ducnmm commented Apr 27, 2026 •

edited

Loading

🔴 1. Auth middleware caps body at 1 MiB — `services/server/src/auth.rs:165`

🔴 2. Sidecar caps `/seal/encrypt` at 256 KB — `services/server/scripts/sidecar-server.ts:298`

🔴 3. `/api/analyze` is now unguarded at 1.5 MiB — `services/server/src/routes.rs`

Results against `6cde4da`

One remaining 🔴 blocker — sidecar `/seal/encrypt` 256 KB cap