perf(agent_server)!: trim conversation skills by default (include_skills=false)#3316
Conversation
… responses
When ``AgentContext`` is constructed with ``load_user_skills=True`` or
``load_public_skills=True``, its model_validator resolves the entire
skill catalog (~40 entries in stock setups) and persists them inline.
Every ``GET /api/conversations`` / ``/search`` / ``/{id}`` therefore
carried ~260 KB of skill content that no API consumer reads — skill
bodies are only consumed server-side at prompt-render time, never
client-side.
Drop ``agent.agent_context.skills`` from the four read endpoints'
response payloads (``GET /search``, ``GET /{id}``, ``GET ""``,
``POST ""`` — plus the deprecated ACP equivalents and ``fork``). The
persisted ``ConversationState`` on disk and the in-memory copy held
by the agent's runtime are untouched. Only the HTTP-response bytes
shrink.
This is the simpler alternative to PR #3302, which attempted the same
goal via a ``@field_serializer`` on ``AgentContext`` itself. See
the route-trim PR description for the comparison; the short version
is that touching the model has to coordinate across persistence,
``model_copy``, ``round_trip``, equality checks, snapshot drift
detection, ``ValidationInfo`` contexts, and a ``state.agent =
agent`` overwrite — each of which the reviewer found edge cases in.
Trimming at the route boundary avoids all of that: the model stays
honest (the wire matches the in-memory shape minus exactly one leaf
field), and the optimization lives in one ~10-line helper that's
called from 7 route handlers.
Measured impact (agent-canvas cold-open of a 40-skill conversation,
same browser, same agent-server port, only the SDK swapped):
- ``GET /api/conversations?ids=…``: 258 KB → 4.6 KB (-98%)
- Cold-open "first event painted": 2226 ms → 1283 ms (-42%)
Tests:
- ``test_conversation_router_skill_trim.py``: 10 cases covering the
helper (no-mutation, identity return on empty, preserve other
agent_context fields, agent without agent_context) and FastAPI
integration through all four read endpoints, plus a concrete
byte-count comparison (trimmed vs ``model_dump_json`` of the same
ConversationInfo with skills intact — ~1500+ bytes saved per
5-skill conversation in the test fixture).
- 351 tests pass on the affected suites (router tests + sdk/context).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED Behavioral default changes detectedThese public
|
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Verified the changed API behavior with real HTTP requests: heavy conversation skills are removed from route responses while the server-side conversation object stays intact.
Does this PR achieve its stated goal?
Yes. The PR set out to trim agent.agent_context.skills at the FastAPI route boundary to shrink ConversationInfo HTTP payloads without changing in-memory/persisted conversation state. Against main, the same heavy conversation returned 40 skills and ~178.5 KB per response; against this PR, standard and deprecated ACP endpoints returned skills: [] and ~2.5 KB responses, while the in-memory object still reported 40 skills before and after API calls.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully; no tests/linters were run. |
| CI Status | ⏳ 20 checks successful, 2 skipped, 8 still in progress; merge state is blocked by pending checks. |
| Functional Verification | ✅ Real FastAPI server + curl requests verified standard and ACP read/start/fork route responses. |
Functional Verification
Test 1: Standard conversation read endpoints shrink payloads
Step 1 — Establish baseline without the fix:
Checked out origin/main, started a temporary FastAPI app using the actual conversation router, and made real HTTP requests with curl against a stubbed conversation containing 40 skills × 4096-byte bodies.
Observed:
BASE untrimmed: {"skill_count":40,"model_dump_json_bytes":178553,"page_model_dump_json_bytes":178585}
BASE search: bytes=178585; skills_len=40; first_content_len=4105
BASE get: bytes=178553; skills_len=40; first_content_len=4105
BASE batch: bytes=178555; skills_len=40; first_content_len=4105
This confirms the pre-PR API response carried the full skill list/content over HTTP.
Step 2 — Apply the PR's changes:
Checked out commit 7b67420b66286e2173754c9573762ed58257a1b8.
Step 3 — Re-run with the fix in place:
Started the same FastAPI route app and called the same endpoints.
Observed:
PR untrimmed before: {"skill_count":40,"model_dump_json_bytes":178553,"page_model_dump_json_bytes":178585}
PR search: bytes=2506; skills_len=0; suffix='qa-suffix'
PR get: bytes=2474; skills_len=0; suffix='qa-suffix'
PR batch: bytes=2476; skills_len=0; suffix='qa-suffix'
PR untrimmed after: {"skill_count":40,"model_dump_json_bytes":178553,"page_model_dump_json_bytes":178585}
This shows the HTTP payload drops by ~176 KB and skills is empty in responses, while the server-side conversation still has all 40 skills before and after requests.
Test 2: Deprecated ACP and write-returning routes also trim responses
Using the PR checkout and same running FastAPI app, I exercised the deprecated ACP equivalents plus the standard/ACP start and standard fork endpoints with valid JSON request bodies.
Observed:
POST statuses: standard_start_status=201; standard_fork_status=201; acp_start_status=201
PR acp_search: bytes=2506; skills_len=0; suffix='qa-suffix'
PR acp_get: bytes=2474; skills_len=0; suffix='qa-suffix'
PR acp_batch: bytes=2476; skills_len=0; suffix='qa-suffix'
PR start: bytes=2474; skills_len=0; suffix='qa-suffix'
PR fork: bytes=2474; skills_len=0; suffix='qa-suffix'
PR acp_start: bytes=2474; skills_len=0; suffix='qa-suffix'
This confirms the route-boundary behavior works across the endpoints called out in the PR, including deprecated ACP routes and POST routes that return ConversationInfo.
Issues Found
None.
This QA review was created by an AI agent (OpenHands) on behalf of the requester.
all-hands-bot
left a comment
There was a problem hiding this comment.
I found one API compatibility concern; details inline. Risk: 🟡 Medium because this changes a public REST/SDK-observable response by default.
This review was generated by an AI agent (OpenHands) on behalf of the user.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26149800184
Default trim changes the public ``ConversationInfo`` shape that external SDK consumers like ``RemoteConversation.agent.agent_context.skills`` already round-trip through. Reviewer flagged the silent-regression risk; agreed. Make it opt-in: - ``include_skills: bool = True`` query param on every read endpoint that emits ``ConversationInfo`` (search, get, batch-get, start, fork, plus the deprecated ACP equivalents). - Default behaviour unchanged — existing callers see the full payload. - ``?include_skills=false`` opts into ``trim_conversation_response_skills``; the helper drops ``agent.agent_context.skills`` from the response while leaving the persisted state and the in-memory runtime copy untouched. Documented in the param title so it surfaces in OpenAPI. Tests cover both branches at every endpoint: - Default (no param) → full skills (5 in the fixture). - ``include_skills=true`` (explicit) → full skills (same). - ``include_skills=false`` → empty list. - Concrete byte-count comparison (default vs opt-in) confirms the ~2500 B saved per 5-skill conversation in the test fixture. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Verified the actual agent-server HTTP API before and after the PR; include_skills=false now trims response skills while default behavior stays full.
Does this PR achieve its stated goal?
Yes. The PR set out to add a backward-compatible, opt-in include_skills=false query parameter for conversation responses. Running real agent-server instances showed main ignored the parameter and returned 2 skills/~9.1 KB responses, while the PR returned 0 skills/~2.5 KB for opt-in calls and still returned full skills for omitted/true calls. A trimmed POST /api/conversations?include_skills=false was also followed by GET default returning 2 skills, confirming the persisted/in-memory conversation was not trimmed.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed and actual agent-server processes started successfully |
| CI Status | Review Thread Gate/unresolved-review-threads, 2 skipped |
| Functional Verification | ✅ Real HTTP requests verified default, explicit true, opt-in false, batch/search/fork, and ACP routes |
Functional Verification
Test 1: Public conversation API before/after behavior
Step 1 — Establish baseline without the fix:
Checked out origin/main, started the real server with OPENHANDS_SUPPRESS_BANNER=1 uv run agent-server --host 127.0.0.1 --port 18081, then ran uv run python /tmp/qa_skill_trim_probe.py http://127.0.0.1:18081 main:
LABEL main
BASE http://127.0.0.1:18081
conversation_id 40556d80-c8ec-48ec-ac58-fe1149ff7ec5
post_false_id 30316fbc-6cfe-4cea-86ae-195b2f9c8fab
endpoint | status | bytes | skills
POST include_skills=false | 201 | 9111 | 2
POST default | 201 | 9111 | 2
GET default | 200 | 9111 | 2
GET include_skills=true | 200 | 9111 | 2
GET include_skills=false | 200 | 9111 | 2
BATCH default | 200 | 9113 | 2
BATCH include_skills=false | 200 | 9113 | 2
SEARCH default | 200 | 21542 | 2
SEARCH include_skills=false | 200 | 21542 | 2
FORK include_skills=false | 201 | 9111 | 2
GET default after trimmed start | 200 | 9111 | 2
This confirms the baseline had no opt-in trimming behavior: even when include_skills=false was present, responses still carried both inline skills and the same byte size as default responses.
Step 2 — Apply the PR's changes:
Checked out perf/api-route-skill-trim at ef1598c1c9532fc302d052f22d4a19de800de405 and started the real server with OPENHANDS_SUPPRESS_BANNER=1 uv run agent-server --host 127.0.0.1 --port 18082.
Step 3 — Re-run with the fix in place:
Ran uv run python /tmp/qa_skill_trim_probe.py http://127.0.0.1:18082 pr:
LABEL pr
BASE http://127.0.0.1:18082
conversation_id 578273c2-44a3-484c-b5ec-3803e828e06d
post_false_id 3fc4e1d6-71d6-4ca9-8b75-278fa2118878
endpoint | status | bytes | skills
POST include_skills=false | 201 | 2492 | 0
POST default | 201 | 9101 | 2
GET default | 200 | 9101 | 2
GET include_skills=true | 200 | 9101 | 2
GET include_skills=false | 200 | 2492 | 0
BATCH default | 200 | 9103 | 2
BATCH include_skills=false | 200 | 2494 | 0
SEARCH default | 200 | 48858 | 2
SEARCH include_skills=false | 200 | 14992 | 0
FORK include_skills=false | 201 | 2492 | 0
GET default after trimmed start | 200 | 9101 | 2
This shows the PR behavior works end-to-end: default and explicit true remain backward-compatible with full skills, opt-in false trims skills to [] and substantially reduces response bytes, and the default GET after a trimmed POST still returns full skills, so the stored conversation was not mutated by response trimming.
Test 2: Deprecated ACP conversation API opt-in behavior
With the PR server still running, ran uv run python /tmp/qa_acp_trim_probe.py http://127.0.0.1:18082 pr-acp:
LABEL pr-acp
conversation_id 18c8d812-80f9-453f-afa3-a2aeae48bc8f
endpoint | status | bytes | skills
ACP POST default | 201 | 7129 | 2
ACP GET include_skills=false | 200 | 2500 | 0
ACP BATCH include_skills=false | 200 | 2502 | 0
ACP SEARCH include_skills=false | 200 | 19986 | 0
This verifies the deprecated ACP equivalents expose the same opt-in trimming behavior over real HTTP requests.
Issues Found
None.
This review was created by an AI agent (OpenHands) on behalf of the user.
all-hands-bot
left a comment
There was a problem hiding this comment.
LGTM — no blocking issues found. Taste Rating: 🟢 Good taste. The route-level opt-in keeps the public default response backward-compatible while trimming only for callers that request it.
This review was generated by an AI agent (OpenHands) on behalf of the user.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26151427932
…lse) BREAKING CHANGE: ``GET /conversations*`` and the deprecated ACP equivalents now return ``agent.agent_context.skills == []`` by default. Callers that still need the legacy full-payload shape can opt back in with ``?include_skills=true``. The trim was previously opt-in (``?include_skills=false``) to keep the default response backward-compatible. Audit of known consumers (agent-canvas, OpenHands app-server, SDK examples) showed none read ``agent.agent_context.skills`` from HTTP responses — they either use in-process ``LocalConversation`` directly or only read fields like ``agent.llm.model``. Default-trim is therefore safe to ship as the steady-state shape, and reverses the cost/benefit: ~260 KB shaved off every conversation fetch unless a caller explicitly asks for it. The persisted ``ConversationState`` on disk and the in-memory copy held by the agent's runtime are unaffected — only the bytes leaving over HTTP shrink. Also makes both ``search`` route handlers non-destructive: they build a fresh ``ConversationPage`` via ``model_copy`` instead of mutating ``page.items`` in place, so an upstream service caching its return value (e.g. ``AsyncMock`` in tests) sees stable behaviour across calls. Drive-by: migrates ``ConversationSortOrder`` and ``EventSortOrder`` to ``StrEnum`` to clear pre-existing ``UP042`` lint errors in ``models.py`` that the pre-commit hook now blocks any further edit to this file on.
Closes #3301. Companion to the closed PR #3302 (in-model field_serializer); this PR takes a simpler route-boundary approach.
TL;DR
The conversation read endpoints now strip
agent.agent_context.skillsfrom the response by default. Callers that still need the legacy full-payload shape can opt back in with?include_skills=true.ConversationStateon disk: untouched.AgentContext.skills(what the prompt renderer reads): untouched.agent.agent_context.skills == [](BREAKING).?include_skills=true): the prior shape, with the full ~260 KB skills array.Breaking change
GET /conversations*and the deprecated ACP equivalents previously returned the full skills list inline; they now return[]unless the caller passes?include_skills=true.Audit of consumers of
agent.agent_context.skillsfrom HTTP responses:agent.kindandagent.llm.modelfromConversationInfo).LocalConversation(in-process), notRemoteConversation.No known caller is affected. Anyone with a custom integration that does read
conversation.agent.agent_context.skillsfrom aGET /conversations*response can append?include_skills=trueto preserve the prior shape — documented as a legacy escape hatch, not the steady-state path.Why default-on (revised from the opt-in revision)
An earlier revision of this PR defaulted to
include_skills=trueand required callers to opt into the trim. Per maintainer feedback, since no client actually reads the field from HTTP, the opt-in shape was carrying ~260 KB of payload that nothing consumes. Flipping the default is the right cost/benefit: every consumer gets the slimmer payload for free, and the rare legacy caller pays a documented opt-in.Empirical measurements (honest)
The wire-level reduction is rock-solid. The user-perceived cold-open improvement I claimed in earlier revisions (
-42%) does not reproduce under careful measurement — it was a Vite cold-start artifact in the baseline, not the trim's effect.Wire-level (reproducible, 5 GETs each)
?include_skills=true(legacy)GET /api/conversations?ids=…(40-skill conv)Browser cold-cold-cold (3 trials; servers killed, Vite cache wiped between trials)
Run-to-run variance is ~200 ms; the trim's effect is ~100 ms. That's at the edge of statistical significance from three trials — directionally consistent (2/3 trials favor the trim) but small enough that it's close to the noise floor.
Order-of-magnitude reconciliation: network transfer (~3 ms saved) + JSON.parse (~7 ms) + React Query structural-share (~7 ms) + server serialization (~1 ms) = ~20 ms expected. Observed ~100 ms — larger than back-of-envelope math, possibly amplified through React reconciliation / GC pressure.
Where the trim actually wins
Where it doesn't win
Cold-open wall-clock on a fast localhost dev box. End-users on localhost won't feel a difference. The dev experience is the same.
Why this exists alongside #3302
#3302 attempted the same win via a
@field_serializer("skills")onAgentContext. It works, has 20 tests, and addresses every edge case the bot review caught — but the patch grew to 5 commits / ~150 lines / multiplePrivateAttrsnapshots / three context flags (preserve_full_skills,loading_from_snapshot,round_trip) / config-drift detection / deep copies / amodel_copy-merge dance inConversationStatefactory. Each guard fixes a real edge case; the underlying reason there are so many is that the model is trying to keep three different truths in sync (in-memory full, wire trimmed, persisted full).This PR moves the optimization to where the bytes actually leave the server. Side-by-side:
RemoteConversation.agent.agent_context.skillsreturns[]from a server that has 40 skills?include_skills=trueopt-in to restore_save_base_state) keeps full snapshotcontext={"preserve_full_skills": True}model_dump(round_trip=True)round-trips losslesslyinfo.round_tripcheckmodel_copy(update={"skills": merged})keeps replacementb)loading_from_snapshotcontext flagagent_context.py+ ~30 instate.pymodels.py+conversation_router*.pyWhat changes
openhands-agent-server/openhands/agent_server/models.pyINCLUDE_SKILLS_PARAM_TITLEconstant (used by both routers for the OpenAPI param docstring).trim_conversation_response_skills(info) -> ConversationInfohelper.ConversationSortOrder/EventSortOrdertoStrEnumto clear pre-existingUP042lint that the pre-commit hook blocks on.openhands-agent-server/openhands/agent_server/conversation_router.py— addsinclude_skills: bool = Falsequery param atsearch,get_conversation,batch_get_conversations,start_conversation,fork_conversation. DefaultFalse(trimmed);Truerestores the legacy shape.searchroute handler is non-destructive: builds a freshConversationPageviamodel_copyinstead of mutatingpage.items.openhands-agent-server/openhands/agent_server/conversation_router_acp.py— same on the deprecated ACP equivalents.tests/agent_server/test_conversation_router_skill_trim.py— 14 tests covering the helper, default-trim branch,include_skills=trueopt-in branch, null-item handling, byte-count comparison.Test plan
pytest tests/agent_server/test_conversation_router_skill_trim.py— 14 passed.pytest tests/agent_server/ -q --deselect TestAutoTitle::test_autotitle_sets_title_on_first_user_message— 1038 passed (1 pre-existing failure unrelated).ruff format+ruff checkon the changed files — clean.?include_skills=truerestores 265 KB.🤖 Generated with Claude Code
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:23f0d9f-pythonRun
All tags pushed for this build
About Multi-Architecture Support
23f0d9f-python) is a multi-arch manifest supporting both amd64 and arm6423f0d9f-python-amd64) are also available if needed