Skip to content

Add proxy_search_session_bodies for full-text body search in sessions#10

Merged
yfe404 merged 1 commit intomainfrom
feat/search-session-bodies
Mar 23, 2026
Merged

Add proxy_search_session_bodies for full-text body search in sessions#10
yfe404 merged 1 commit intomainfrom
feat/search-session-bodies

Conversation

@yfe404
Copy link
Copy Markdown
Owner

@yfe404 yfe404 commented Mar 23, 2026

Summary

  • New proxy_search_session_bodies tool — decompresses and searches actual HTTP request/response bodies stored in persistent sessions, returning grep-like context snippets around matches
  • Adds responseContentType to SessionIndexEntry for efficient content-type pre-filtering without loading full records from disk
  • Updates proxy_query_session description to clarify it's metadata-only and point users to the new body search tool

Motivation

proxy_query_session text search only matches against index metadata (URL, hostname, path, exchangeId, matchedRuleId). Users searching for text inside response bodies (e.g., a price like "299,-" in HTML) get 0 results even though the content exists in the decompressed body. The in-memory proxy_search_traffic searches bodyPreview but only covers the live ring buffer (1000 entries, lost on restart).

Key design decisions

  • Separate toolproxy_query_session stays fast (index-only), body search is explicitly heavier
  • Single file handle for bulk record reads (vs open/close per record)
  • Preview fallback — works with both full and preview capture profiles; source field tells the agent which it got
  • Binary safety — null-byte detection + known MIME prefix skip list
  • Backward compatible — old sessions without responseContentType in index still work

Test plan

  • 30 new unit tests covering core search, case sensitivity, preview fallback, all pre-filters, limits, edge cases, backward compat, snippet extraction
  • tsc --noEmit passes
  • npm run build succeeds
  • Updated integration test tool count (76 → 77)

proxy_query_session only searches index metadata (URL, hostname, path),
returning 0 results for text present in actual response/request bodies.
This new tool decompresses and searches stored bodies with grep-like
context snippets around matches.

Features:
- Pre-filters by hostname, URL, method, status code, content-type
- Searches response bodies, request bodies, or both
- Case-sensitive/insensitive search
- Falls back to bodyPreview for preview-profile sessions
- Skips binary content (null-byte detection + MIME prefix list)
- Single file handle for bulk record reads (not per-record)
- max_scan and limit caps for bounded resource usage

Also adds responseContentType to SessionIndexEntry for efficient
content-type pre-filtering without loading full records.

30 new unit tests covering core search, case sensitivity, preview
fallback, all pre-filters, limits, edge cases, backward compat,
and snippet extraction.
@yfe404 yfe404 merged commit 1593dca into main Mar 23, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant