perf(agent_context): drop auto-loaded skills from serialized output (-263 KB/conversation, -48% cold open)#3302
perf(agent_context): drop auto-loaded skills from serialized output (-263 KB/conversation, -48% cold open)#3302simonrosenberg wants to merge 9 commits into
Conversation
When ``load_user_skills`` / ``load_public_skills`` are enabled,
``_load_auto_skills`` mutates ``self.skills`` at model-validation
time with the fully resolved skill catalog. Pydantic then persists
those skills onto the conversation's stored state and serializes
them back on every ``GET /api/conversations``. For a stock setup
that bundles ~40 skills under ``~/.openhands/skills``, that's
~263 KB of duplicated payload per stored conversation per fetch —
content the client never reads directly, because the skills are
only consumed server-side when rendering the system prompt.
Track which skill names ``_load_auto_skills`` added in a
``PrivateAttr`` and drop them at serialize time via a
``@field_serializer("skills")``. The validator is ``mode="after"``,
so it re-runs on every deserialization and repopulates the same
auto-loaded subset deterministically from the
``load_user_skills`` / ``load_public_skills`` / ``marketplace_path``
configuration. Runtime callers of ``self.skills`` (prompt rendering,
``to_acp_prompt_context``, ``get_dynamic_context``) see the full
resolved list as today — only the wire payload changes.
Measured impact (agent-canvas at ``/conversations/<id>``, cold-cold
full page reload, agent-server 1.22.1, same browser):
Conversation Before After (this PR)
40 skills + 161 events 2226 ms ~1159 ms (~48% faster)
0 skills + 209 events 1159 ms 1159 ms (unchanged)
0 skills + 0 events 979 ms 979 ms (unchanged)
Tests pin both halves of the contract:
- In-memory: every ``self.skills`` lookup keeps returning the full
resolved list, including auto-loaded entries.
- On the wire: ``model_dump`` returns only explicit skills (caller-
supplied via the constructor); auto-loaded ones are absent.
- Round-trip: dumping then ``model_validate``-ing repopulates the
auto-loaded subset via the same validator, so a roundtripped
``AgentContext`` has the same in-memory shape as the source.
- An explicit skill colliding with an auto-loaded name wins (it was
never marked auto-loaded), in-memory AND on the wire.
- A concrete byte-count assertion: 40-skill auto-load + 1 explicit
skill serializes to <1 KB, not the ~40 KB the auto-loaded set
would occupy.
Closes software-agent-sdk#3301.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED Behavioral default changes detectedThese public
|
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
…d skills Stored conversations created before PR #3302 carry the resolved auto- loaded skill list inlined on ``agent_context.skills``. The validator's "skip if name in existing_names" branch hit those entries without marking them as auto-loaded, so the new serializer kept them on the wire — the bloat would persist until each conversation was recreated. Detect them: when an ``existing`` skill matches a loader-produced one by equality (full Pydantic model compare), mark it auto-loaded too. That makes the wire payload shrink on the very next ``GET`` after the SDK is bumped, with no conversation re-creation needed. A persisted skill that no longer matches the loader's current output (user edited the file, marketplace updated) stays treated as explicit — on-disk content wins. Covered by ``test_migration_stored_skills_diverged_from_loader_stay_explicit``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable, but there are two serialization-scope issues to address before I’d be comfortable approving.
Risk: 🟡 MEDIUM — this changes persisted AgentContext skill snapshots and nested serialization semantics, which can affect resumed conversations.
This review was created by an AI agent (OpenHands) on behalf of the user.
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Direct SDK execution confirms the PR removes auto-loaded public skills from serialized AgentContext payloads while preserving runtime skills, explicit skills, and round-trip behavior.
Does this PR achieve its stated goal?
Yes. On origin/main, constructing an AgentContext with 40 auto-loaded public skills plus one explicit skill serialized all 41 skills (model_dump_json_bytes 259897). On the PR head, the same user-facing SDK operation still had 41 runtime skills, but serialized only the explicit skill (model_dump_json_bytes 543), and AgentContext.model_validate(...) rehydrated the runtime skill list back to 41.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully and installed the uv environment. |
| CI Status | ⏳ At last check: 12 success, 3 skipped, 8 in progress, 5 queued, no failures observed. |
| Functional Verification | ✅ Base-to-PR comparison verified payload shrink, explicit preservation, round-trip rehydration, and old-payload migration on current PR head. |
Functional Verification
Test 1: New serialized output drops auto-loaded skills but keeps runtime behavior
Step 1 — Establish baseline without the fix:
Ran an SDK script on origin/main (87ff9bd9) that constructed AgentContext(load_public_skills=True, load_user_skills=False, skills=[explicit_skill]) and printed runtime/dump metrics.
Output:
runtime_skill_count 41
dumped_skill_count 41
model_dump_json_bytes 259897
This confirms the baseline problem: the runtime loaded 40 public skills plus the explicit skill, and the full resolved skill catalog was serialized.
Step 2 — Apply the PR's changes:
Checked out current PR head 15002de8 (the PR advanced during QA; the originally supplied 5e1aca8 showed the same new-payload behavior).
Step 3 — Re-run with the fix in place:
Ran the same SDK construction, then called ctx.model_dump(), ctx.model_dump_json(), and AgentContext.model_validate(dumped).
Output:
runtime_skill_count 41
dumped_skill_count 1
dumped_skill_names ['qa-explicit-skill']
roundtrip_skill_count 41
roundtrip_has_public_skill True
model_dump_json_bytes 543
This shows the PR achieves the payload goal: runtime still sees all auto-loaded skills, serialized output keeps only the explicit skill, and deserializing the trimmed payload restores public skills.
Test 2: Existing old serialized payloads shrink after loading on current PR head
Step 1 — Establish old payload baseline:
On origin/main, I serialized the same context to /tmp/agent_context_old_payload.json.
Output:
old_payload_skill_count 41
old_payload_json_bytes 263469
This represents an existing conversation payload with auto-loaded skills already inlined.
Step 2 — Apply the PR's changes:
Checked out current PR head 15002de8.
Step 3 — Load and re-serialize the old payload:
Loaded /tmp/agent_context_old_payload.json with AgentContext.model_validate(payload) and re-serialized it with ctx.model_dump(mode='json').
Output:
payload_skill_count 41
loaded_runtime_skill_count 41
loaded_dumped_skill_count 1
loaded_dumped_skill_names ['qa-explicit-skill']
loaded_dumped_json_bytes 587
This confirms the current PR head also handles old inlined payloads: loading preserves runtime skills, and re-serializing drops the auto-loaded catalog.
Issues Found
None.
This QA review was created by an AI agent (OpenHands) on behalf of the user.
…s survive
OpenHands' ``_create_agent_with_skills`` (and any caller that does
``agent_context.model_copy(update={'skills': merged})``) replaces
auto-loaded skills in place when its merged list carries the same
name with different content. A name-only ``_auto_loaded_skill_names``
filter would silently drop those replacements at serialize time and
the receiver would auto-reload the stock version on the next
deserialization — losing the customisation.
Switch the PrivateAttr to a ``dict[str, Skill]`` snapshot. Serializer
drops a skill iff ``_auto_loaded_skills.get(name) == s`` — equality,
not name match. A replacement has different field values, fails the
equality check, and survives on the wire as today.
Verified safe across known consumers:
- ``openhands-sdk/workspace/remote/base.py:920`` — explicit skills with
``load_public_skills=False``; auto-load short-circuits, serializer
is a no-op.
- ``openhands-sdk/workspace/remote/base.py:923`` — empty explicit with
``load_public_skills=True``; serializer drops the auto-loaded
skills (the win we want).
- ``OpenHands/openhands/app_server/.../live_status_app_conversation_service.py:1394``
— constructs AgentContext WITHOUT either ``load_*_skills`` flag.
Auto-load is a no-op, ``_auto_loaded_skills`` stays empty.
- ``OpenHands/openhands/app_server/.../app_conversation_service_base.py:159``
— ``_create_agent_with_skills`` does the model_copy + skill-replace
flow. Covered by the new regression test.
New test:
``test_replacing_auto_skill_via_model_copy_survives_serialization``
— pins the exact OpenHands flow: auto-load a "git-skill", merge with
a different "git-skill" (same name, different content), model_copy,
serialize. Asserts the replacement survives on the wire instead of
being dropped as auto-loaded.
296/296 tests pass on the affected suite.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ptions Two issues from the bot review of PR #3302: 1. The serializer fired on EVERY ``model_dump``, including ``ConversationState._save_base_state`` (the persistence path). Without auto-loaded skills on the persisted snapshot, a resumed conversation would silently pick up *new* skill content from ``~/.openhands/skills`` / the public marketplace via ``_load_auto_skills`` — losing the freeze-at-create-time guarantee. Fix: opt-out via ``context={"preserve_full_skills": True}``. ``_save_base_state`` passes it; the API-response path doesn't — so persistence keeps the snapshot and the wire still gets the win. 2. The manual ``s.model_dump(mode=..., context=...)`` inside the serializer dropped the caller's ``exclude_none`` / ``exclude_defaults`` / nested ``include`` options. ``AgentContext(skills=[...]).model_dump( exclude_none=True)`` would have leaked ``trigger: None``, ``source: None`` etc. — which also bloats the very payloads this PR optimises. Fix: switch to ``@field_serializer("skills", mode="wrap")`` and delegate the surviving list to ``handler``. Pydantic's default nested machinery then respects every caller option. Tests: - ``test_preserve_full_skills_context_flag_keeps_snapshot`` — flag reverts to full serialization for the snapshot path. - ``test_wrap_serializer_propagates_exclude_none`` — pins the comment-2 regression with a concrete ``Skill.trigger`` canary. - ``test_save_base_state_persists_full_skill_snapshot`` — end-to-end through ``ConversationState`` to ``InMemoryFileStore``; asserts the persisted JSON keeps all auto-loaded skill names. 299/299 tests pass on the affected suites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable, but two serialization edge cases can lose caller-modified skills.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟡 MEDIUM — changes AgentContext serialization/rehydration semantics for persisted and remote conversation payloads.
VERDICT:
❌ Needs follow-up before approval.
This review was created by an AI agent (OpenHands) on behalf of the user.
Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:
- Add a
.agents/skills/custom-codereview-guide.mdfile to your branch (or edit it if one already exists) with the/codereviewtrigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.- Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
- When your PR is merged, the guideline file goes through normal code review by repository maintainers.
Resolve with AI? Install the iterate skill in your agent and run
/iterateto automatically drive this PR through CI, review, and QA until it's merge-ready.
| if info.context and info.context.get("preserve_full_skills"): | ||
| return handler(value) | ||
| auto = self._auto_loaded_skills | ||
| kept = [s for s in value if auto.get(s.name) != s] |
There was a problem hiding this comment.
🟠 Important: This filter still uses the copied private snapshot even if the auto-load config was changed after validation. For example, ctx.model_copy(update={"load_public_skills": False}) keeps the runtime auto-loaded skills but serializes them as []; on AgentContext.model_validate(...) they are not reloaded because the flag is now false. Changing marketplace_path has the same stale-snapshot problem, except the receiver may reload a different catalog. Please either tie the snapshot to the (load_user_skills, load_public_skills, marketplace_path) config and no-op when it no longer matches, or clear/recompute the snapshot when those fields are updated.
| existing = existing_by_name.get(name) | ||
| if existing is None: | ||
| self.skills.append(skill) | ||
| self._auto_loaded_skills[name] = skill |
There was a problem hiding this comment.
🟠 Important: This stores the same mutable Skill object that is appended to self.skills, so it is not actually a snapshot. Since Skill is mutable today, an in-place caller edit like ctx.skills[0].content = "custom" mutates the private copy too; equality still succeeds and model_dump() drops the customized skill instead of preserving it. Store an independent snapshot (deep copy or stable serialized fingerprint) and add a regression test for in-place modification.
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
The SDK serialization behavior was exercised directly on main vs this PR, and the PR trims auto-loaded skills from default wire payloads while preserving runtime behavior and persisted snapshots.
Does this PR achieve its stated goal?
Yes. The goal was to drop auto-loaded skills from serialized AgentContext output without changing the in-memory resolved skills list or losing persistence safety. In direct SDK usage, main serialized 40 public skills into a 263,125-byte context, while this PR kept 40 runtime skills but serialized 0 auto-loaded skills by default in a 245-byte context; round-trip validation rebuilt the 40 skills, explicit/replacement skills stayed serialized, and ConversationState._save_base_state still persisted all 40 skills.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully with uv-managed dependencies. |
| CI Status | Build & Push (python-amd64) and unresolved-review-threads were failing; several checks were still pending. |
| Functional Verification | ✅ Direct SDK before/after probes verified payload shrink, runtime round-trip, explicit/replacement preservation, migration trimming, and full persistence snapshots. |
Functional Verification
Test 1: Default AgentContext serialization trims auto-loaded public skills while runtime behavior stays intact
Step 1 — Establish baseline without the fix:
Checked out origin/main (42c49c83) and ran:
OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_agent_context_probe.pyRelevant output:
{
"runtime_skill_count": 40,
"default_dump_skill_count": 40,
"default_context_bytes": 263125,
"roundtrip_runtime_skill_count": 40,
"persisted_skill_count": 40,
"preserve_full_dump_skill_count": 40
}This confirms the old behavior: the runtime correctly loaded 40 public skills, but the default serialized context also carried all 40 skills (~263 KB).
Step 2 — Apply the PR's changes:
Checked out PR commit 98f2c62375c692a565859579f9b2ec6d619e0554.
Step 3 — Re-run with the fix in place:
Ran the same command:
OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_agent_context_probe.pyObserved:
{
"runtime_skill_count": 40,
"default_dump_skill_count": 0,
"default_context_bytes": 245,
"roundtrip_runtime_skill_count": 40,
"explicit_dump_skill_names": ["qa-explicit-skill"],
"replacement_dump_skill_count": 1,
"replacement_dump_first_content": "QA replacement body should survive serialization",
"persisted_skill_count": 40,
"preserve_full_dump_skill_count": 40,
"preserve_full_context_bytes": 263125
}This shows the PR delivers the key payload optimization: auto-loaded public skills stay available at runtime and after model_validate(...), but default serialization drops them. It also confirms explicit caller-supplied skills and same-name replacement skills remain on the wire, and persistence paths still preserve the full snapshot.
Test 2: Old full payloads migrate to trimmed default output on the PR
Step 1 — Establish baseline without the fix:
On origin/main, saved a real full AgentContext(load_public_skills=True) payload:
OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_save_main_payload.pyOutput:
{
"runtime_skill_count": 40,
"saved_payload_bytes": 263125,
"saved_payload_skill_count": 40
}This represents the stored/pre-PR payload shape with all resolved public skills inlined.
Step 2 — Apply the PR's changes:
Checked out PR commit 98f2c62375c692a565859579f9b2ec6d619e0554 and loaded that main-branch payload through AgentContext.model_validate(...).
Step 3 — Re-run with the fix in place:
Ran:
OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_migration_probe.pyObserved:
{
"old_payload_skill_count": 40,
"old_payload_bytes": 263125,
"runtime_skill_count_after_validate": 40,
"default_dump_skill_count_after_validate": 0,
"default_dump_bytes_after_validate": 245,
"preserve_full_skill_count_after_validate": 40
}This confirms the migration behavior claimed by the PR: old bloated payloads still load into the full in-memory skill list, but the next default serialization trims the auto-loaded skills while the explicit full-snapshot mode still retains them.
Issues Found
None from functional QA. CI still had failing/pending checks at the time of review, but the changed SDK behavior itself worked in the manual probes above.
This review was created by an AI agent (OpenHands) on behalf of the user.
…ntegrity
Two more edge cases from the bot review:
1. ``_auto_loaded_skills`` stored the same ``Skill`` reference that
was appended to ``self.skills``. ``Skill`` is mutable, so an
in-place edit (``ctx.skills[0].content = "custom"``) mutated the
snapshot too; the equality check still succeeded and the
customised skill silently disappeared from ``model_dump``.
Fix: deep-copy the snapshot via ``skill.model_copy(deep=True)``.
Pinned by ``test_in_place_skill_mutation_preserves_customisation_on_wire``.
2. The serializer used the snapshot even when the auto-load config
changed after validation. ``ctx.model_copy(update={
"load_public_skills": False})`` keeps the runtime auto-loaded
skills (validators don't re-run on model_copy) but used to
serialize them as ``[]``. On ``model_validate`` the receiver
wouldn't auto-reload them (flag now False) — skills lost.
Same shape for ``marketplace_path`` changes (receiver re-loads
from a *different* catalog).
Fix: snapshot ``(load_user_skills, load_public_skills,
marketplace_path)`` alongside the skills. If the current config
drifted, the serializer no-ops the trim and emits the full list,
preserving the round-trip. Pinned by:
- ``test_model_copy_toggling_load_flag_off_keeps_skills_on_wire``
- ``test_model_copy_changing_marketplace_path_keeps_skills_on_wire``
302/302 tests pass on the affected suites.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
The SDK serialization change works as described: auto-loaded public skills stay available at runtime but are removed from the default serialized payload, while explicit/replaced/persisted skills remain preserved.
Does this PR achieve its stated goal?
Yes. On main, a real AgentContext(load_public_skills=True) serialized all 40 public skills into a 259,583-byte payload; on this PR, the same SDK usage serialized 0 auto-loaded skills into a 230-byte payload while ctx.skills and a dump→validate round trip still had all 40 skills. I also exercised explicit skills, replacement-via-model_copy, load-flag drift, and conversation base-state persistence, and those paths preserved skills as the PR promises.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed and installed the uv development environment |
| CI Status | Review Thread Gate/unresolved-review-threads |
| Functional Verification | ✅ Real SDK calls confirmed payload reduction and preservation behavior |
Functional Verification
Test 1: Default serialized payload drops auto-loaded public skills, but runtime and round-trip keep them
I used a real SDK script that constructs AgentContext(load_public_skills=True), calls model_dump() / model_dump_json(), validates the dumped payload back into AgentContext, and also measures model_dump(context={"preserve_full_skills": True}).
Step 1 — Establish baseline on origin/main:
Ran git checkout --detach origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_agent_context_payload.py.
Relevant output:
{
"runtime_skill_count": 40,
"serialized_skill_count": 40,
"serialized_payload_bytes": 259583,
"roundtrip_skill_count": 40,
"preserve_full_skill_count": 40,
"preserve_full_payload_bytes": 259583
}This confirms the old behavior: the resolved public skill catalog was duplicated into the default serialized output.
Step 2 — Apply the PR's changes:
Checked out perf/agent-context-skill-payload at 713148b1475053eea1fc7df7eed03533dd6e6893.
Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_agent_context_payload.py.
Relevant output:
{
"runtime_skill_count": 40,
"serialized_skill_count": 0,
"serialized_payload_bytes": 230,
"roundtrip_skill_count": 40,
"preserve_full_skill_count": 40,
"preserve_full_payload_bytes": 259583
}This confirms the PR's headline goal: default wire serialization shrank dramatically, but runtime callers and deserialization still see the full resolved skill list. The persistence-style preserve_full_skills context still emits the complete snapshot.
Test 2: Explicit/replaced/config-drift/persistence paths preserve skills
Step 1 — Baseline expectation:
The PR explicitly claims only unmodified auto-loaded skills should be trimmed; explicit skills, replaced skills under the same name, load-flag-drifted copies, and conversation persistence snapshots should remain serialized.
Step 2 — Apply the PR's changes:
Used the PR branch from Test 1.
Step 3 — Exercise those SDK paths:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_agent_context_edges.py.
Relevant output:
{
"explicit_runtime_skill_count": 41,
"explicit_serialized_skill_names": ["qa-explicit"],
"explicit_roundtrip_has_explicit": true,
"replacement_target_name": "uv",
"replacement_serialized_skills": [{"name": "uv", "content": "QA replacement body"}],
"replacement_roundtrip_content": "QA replacement body",
"load_flag_flipped_serialized_count": 40,
"load_flag_flipped_roundtrip_count": 40,
"persisted_base_state_skill_count": 40
}This confirms the non-headline safety behavior: explicit caller skills survive default serialization, a consumer replacement under an auto-loaded name survives serialization and round-trip, config drift serializes the full list instead of losing it, and the actual conversation base-state save path persists the full 40-skill snapshot.
Issues Found
None.
This QA review was created by an AI agent (OpenHands) on behalf of the user.
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable — good target, but two edge cases still break the explicit/preserved skill guarantees.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟡 MEDIUM — changes AgentContext serialization/rehydration semantics and can alter skill prompts after a round-trip.
VERDICT:
❌ Needs follow-up before approval.
This review was created by an AI agent (OpenHands) on behalf of the user.
Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:
- Add a
.agents/skills/custom-codereview-guide.mdfile to your branch (or edit it if one already exists) with the/codereviewtrigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.- Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
- When your PR is merged, the guideline file goes through normal code review by repository maintainers.
Resolve with AI? Install the iterate skill in your agent and run
/iterateto automatically drive this PR through CI, review, and QA until it's merge-ready.
| self.load_public_skills, | ||
| self.marketplace_path, | ||
| ) | ||
|
|
There was a problem hiding this comment.
🟠 Important: preserve_full_skills still doesn't freeze a saved snapshot on load. If the persisted payload contains auto skill a with load_public_skills=True, and the loader now returns a plus new b, this branch appends b during AgentContext.model_validate(...). The next persisted/full dump includes b, so the supposedly preserved snapshot can pick up newly published skills. The restore path needs a way to know a full snapshot is authoritative and skip appending missing auto skills (or disable auto-load for preserved snapshots).
| if existing is None: | ||
| self.skills.append(skill) | ||
| # Deep-copy so an in-place caller mutation | ||
| # (``ctx.skills[0].content = "custom"``) doesn't also |
There was a problem hiding this comment.
🟠 Important: This migration branch also catches caller-supplied explicit skills that happen to be equal to the current auto-loaded skill. In that case AgentContext(load_public_skills=True, skills=[explicit]) records the explicit skill in _auto_loaded_skills, model_dump() emits skills: [], and a later round-trip after the marketplace/user skill changes reloads the new content instead of the caller's pinned explicit skill. That contradicts the PR guarantee that explicit skills are unaffected. Please distinguish legacy persisted snapshots from direct caller-supplied skills, or keep equal existing skills explicit by default.
…pshot loads
Two more edge cases from the third bot review:
1. ``preserve_full_skills`` only prevented the trim on *save*. The
load path still ran ``_load_auto_skills`` unconditionally and
*appended* any newly-published skills the loader returned. A
snapshot frozen as {a} would silently grow to {a, b} after `b`
landed in the marketplace — and the next save would persist the
drifted set.
Fix: ``_load_auto_skills`` now accepts ``ValidationInfo``. When
``info.context["loading_from_snapshot"]`` is true (set by
``ConversationState`` factory on the resume path), the validator
skips the append-new branch entirely — the persisted ``skills``
list is treated as authoritative.
2. The migration branch (mark existing skills as auto-loaded when
they equal the loader's output) also fired for fresh
``AgentContext(load_public_skills=True, skills=[explicit])``
construction. A caller-pinned explicit skill that happened to
equal the marketplace version would be snapshotted as auto-loaded,
dropped from the wire, and silently swapped to a *new*
marketplace version on the next round-trip.
Fix: migration only fires under ``loading_from_snapshot``. Fresh
construction keeps the "equal existing" branch as a no-op — the
caller-supplied skill stays explicit on the wire.
The ``ConversationState`` resume path at ``state.py:354`` now passes
``context={"loading_from_snapshot": True}`` alongside the cipher.
API-response paths and direct caller construction don't pass the
flag, so they keep fresh-construction semantics — which is what we
want for the shrink-the-wire win.
New tests (5 → 8):
- ``test_fresh_construction_does_not_conflate_explicit_with_auto``
— the comment-6 regression, fresh construction with a name
collision.
- ``test_loading_from_snapshot_does_not_pick_up_new_auto_loaded_skills``
— the comment-5 regression, snapshot {a} + loader returning
{a, b} stays {a} in memory and on the wire.
- ``test_fresh_construction_still_appends_new_auto_skills`` — sanity
that the default branch keeps growing for first-launch users.
- Existing migration tests updated to invoke the load path via
``model_validate(context={"loading_from_snapshot": True})``.
305/305 tests pass on the affected suites.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Verified the SDK behavior with real temporary user skills: the PR trims auto-loaded skills from normal serialization while preserving runtime skills, explicit skills, round-trip reloading, and full persisted snapshots.
Does this PR achieve its stated goal?
Yes. On base main, an AgentContext(load_user_skills=True) with 8 auto-loaded skills plus 1 explicit skill serialized all 9 skills (19,782 bytes); on this PR, the same real SDK usage kept 9 runtime skills but serialized only the explicit skill (534 bytes). I also verified the snapshot path: preserve_full_skills keeps the full skill list, and loading that snapshot after the skill source changes keeps the original content instead of picking up newly added/changed skills.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully with uv-managed dependencies. |
| CI Status | unresolved-review-threads was failing and several review/build-publish jobs were still pending/in progress at QA time. |
| Functional Verification | ✅ Real SDK imports/calls exercised serialization, re-validation, and snapshot semantics. |
Functional Verification
Test 1: Normal serialization trims only auto-loaded skills, not runtime or explicit skills
Step 1 — Establish baseline without the fix:
Checked out origin/main (64b36cf8) and ran a real SDK script that:
- created 8 markdown skills under a temporary
~/.openhands/skills/, - constructed
AgentContext(load_user_skills=True, skills=[qa-explicit]), - called
model_dump(),model_dump(context={"preserve_full_skills": True}), andAgentContext.model_validate(...).
Observed:
{
"commit": "64b36cf8",
"runtime_skill_count": 9,
"default_dump_skill_count": 9,
"default_dump_skill_names": ["qa-explicit", "qa-auto-2", "qa-auto-1", "qa-auto-5", "qa-auto-4", "qa-auto-0", "qa-auto-6", "qa-auto-3", "qa-auto-7"],
"default_dump_json_bytes": 19782,
"preserve_full_dump_skill_count": 9,
"roundtrip_runtime_skill_count": 9,
"roundtrip_has_explicit": true
}This confirms the pre-PR behavior carried the full auto-loaded skill payload in normal serialized output.
Step 2 — Apply the PR's changes:
Checked out perf/agent-context-skill-payload at c55f95db.
Step 3 — Re-run with the fix in place:
Ran the same SDK script and observed:
{
"commit": "c55f95db",
"runtime_skill_count": 9,
"default_dump_skill_count": 1,
"default_dump_skill_names": ["qa-explicit"],
"default_dump_json_bytes": 534,
"preserve_full_dump_skill_count": 9,
"roundtrip_runtime_skill_count": 9,
"roundtrip_has_explicit": true
}This shows the headline behavior works: runtime still sees all 9 skills, normal serialization drops the 8 auto-loaded skills, the explicit caller-supplied skill remains on the wire, the preserved/full path still emits all skills, and re-validation restores the full runtime shape.
Test 2: Full snapshots stay frozen when skill sources change
On the PR branch, I created one real user skill, captured both normal API-style output and preserve_full_skills output, then changed that skill's file contents and added a new skill before re-validating.
Observed:
{
"created_runtime_names": ["qa-stable"],
"snapshot_skill_names": ["qa-stable"],
"api_dump_skill_names": [],
"loaded_snapshot_names": ["qa-stable"],
"loaded_snapshot_contents": {
"qa-stable": "# stable
original stable content"
},
"loaded_trimmed_names": ["qa-stable", "qa-new"]
}This confirms the two intended modes: persisted/full snapshots keep the original skill content even after the source changes, while trimmed API-style payloads re-resolve the current auto-loaded skill set on load.
Issues Found
None.
This QA review was created by an AI agent (OpenHands) on behalf of the user.
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable — the payload win is real, but two serialization/persistence edge cases still need attention.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟡 MEDIUM — changes AgentContext serialization and persisted resume semantics for skill catalogs.
VERDICT:
❌ Needs follow-up before approval.
This review was created by an AI agent (OpenHands) on behalf of the user.
Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:
- Add a
.agents/skills/custom-codereview-guide.mdfile to your branch (or edit it if one already exists) with the/codereviewtrigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.- Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
- When your PR is merged, the guideline file goes through normal code review by repository maintainers.
Resolve with AI? Install the iterate skill in your agent and run
/iterateto automatically drive this PR through CI, review, and QA until it's merge-ready.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26148387350
| nested ``include`` / ``exclude`` for the surviving skills — | ||
| manual ``s.model_dump(...)`` would have ignored them. | ||
| """ | ||
| if info.context and info.context.get("preserve_full_skills"): |
There was a problem hiding this comment.
🟠 Important: model_dump(round_trip=True) still goes through the lossy API-wire path here and emits skills: [] for auto-loaded skills. round_trip=True is Pydantic's standard signal for a dump that callers can persist/reload without semantic loss; with this serializer, a caller using it for an AgentContext snapshot will reload whatever the current external skill source returns instead of the in-memory catalog they dumped. Please treat info.round_trip like preserve_full_skills and add a regression test for AgentContext(load_public_skills=True).model_dump(round_trip=True) preserving the full skill list.
| # {a} would silently grow to {a, b} after `b` lands in the | ||
| # public marketplace, breaking the freeze-at-create guarantee | ||
| # ``_save_base_state`` relied on. See software-agent-sdk#3301. | ||
| context: dict[str, Any] = {"loading_from_snapshot": True} |
There was a problem hiding this comment.
🟠 Important: this loading_from_snapshot context is not enough to preserve the frozen snapshot, because the loaded state.agent is immediately replaced by the runtime agent below. That runtime agent is normally validated without this context; if the loader changed from {a} to {a,b} between save and resume, it already contains b, and autosave rewrites base_state.json with both skills. I reproduced this through ConversationState.create(..., persistence_dir=...): initial persisted skills ['a'], resumed/runtime skills ['a', 'b'], and the rewritten base state becomes ['a', 'b']. Please preserve/merge the loaded snapshot before autosaving the runtime agent (or adjust the stated freeze-at-create contract), and cover the real create→resume path rather than only calling _save_base_state directly.
…sume Two edge cases from the fourth bot review: 1. ``model_dump(round_trip=True)`` is Pydantic's canonical signal for "this dump must rehydrate without semantic loss". The trim ran under it too, so a caller using it for an ``AgentContext`` snapshot would silently reload whatever the loader returns now instead of the in-memory catalog they dumped. Fix: ``info.round_trip`` opts out of the trim, same as the explicit ``preserve_full_skills`` context flag. Pinned by ``test_round_trip_dump_preserves_full_skill_list``. 2. The persistence-resume guard (``loading_from_snapshot``) correctly populated ``state.agent.agent_context.skills`` from the snapshot — but the next line, ``state.agent = agent``, unconditionally overwrote the loaded agent with the runtime one. The runtime agent's ``agent_context`` carried a freshly- resolved (and potentially drifted) skill list from its own ``_load_auto_skills`` run. Autosave then rewrote ``base_state.json`` with the drifted set. The reviewer reproduced this end-to-end: initial persisted ``['a']`` → resumed runtime ``['a', 'b']`` → autosave persisted ``['a', 'b']``. Fix: before ``state.agent = agent``, ``model_copy`` the runtime agent's ``agent_context`` with the loaded snapshot's skills. The snapshot stays authoritative for skills; runtime-only fields (LLM client, tool wiring, etc.) come from the runtime agent as today. Pinned by ``test_conversationstate_resume_preserves_snapshot_skills`` — full create → save → resume cycle via ``ConversationState.create`` and ``LocalFileStore``, not just ``_save_base_state`` in isolation. 793/793 tests pass on ``tests/sdk/context/`` + ``tests/sdk/conversation/`` + ``tests/cross/test_agent_secrets_integration.py``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
all-hands-bot
left a comment
There was a problem hiding this comment.
⚠️ QA Report: PASS WITH ISSUES
Functional verification passes: the PR trims auto-loaded skill payloads from serialized API-style output while preserving runtime skills, explicit skills, round-trip snapshots, and persisted conversation snapshots; CI is not fully green.
Does this PR achieve its stated goal?
Yes. Exercising the SDK as a user would showed AgentContext(load_public_skills=True) still loads the full 40 public skills plus an explicit skill at runtime, but default serialization on the PR drops the payload from 263,464 bytes / 41 skills to 582 bytes / 1 explicit skill. ConversationState API-style serialization likewise dropped from 265,711 bytes / 41 skills to 2,829 bytes / 1 skill, while base_state.json persistence stayed full at 41 skills so resume snapshots are not lost.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully; no tests, linters, formatters, or pre-commit hooks were run by QA. |
| CI Status | pre-commit and unresolved-review-threads failing, with several image/build jobs still pending; SDK/cross/workspace/tools/windows tests are green in CI. |
| Functional Verification | ✅ Before/after SDK scripts confirmed the payload reduction, explicit-skill preservation, full round_trip dumps, full persistence snapshots, and old-conversation migration behavior. |
Functional Verification
Test 1: AgentContext default serialization trims auto-loaded skills but preserves runtime behavior
Step 1 — Establish baseline without the fix:
Checked out origin/main and ran uv run python /tmp/qa_agent_context_payload.py, which constructs a real AgentContext(load_public_skills=True, load_user_skills=False, skills=[explicit]), calls model_dump(), model_dump(round_trip=True), and reloads the dumped payload:
{
"default_payload_bytes": 263464,
"default_serialized_has_explicit": true,
"default_serialized_skill_count": 41,
"restored_has_explicit": true,
"restored_runtime_skill_count": 41,
"round_trip_payload_bytes": 263464,
"round_trip_serialized_skill_count": 41,
"runtime_has_explicit": true,
"runtime_skill_count": 41
}This confirms the old behavior carried all 40 auto-loaded public skills plus the explicit skill in default serialized output.
Step 2 — Apply the PR's changes:
Checked out d81d7c05e257908256176fc6884a313945ac73a7.
Step 3 — Re-run with the fix in place:
Ran the same command:
{
"default_payload_bytes": 582,
"default_serialized_has_explicit": true,
"default_serialized_skill_count": 1,
"restored_has_explicit": true,
"restored_runtime_skill_count": 41,
"round_trip_payload_bytes": 263464,
"round_trip_serialized_skill_count": 41,
"runtime_has_explicit": true,
"runtime_skill_count": 41
}This shows the PR achieves the headline behavior: default wire serialization keeps only the explicit skill, runtime still has all 41 skills, reloading the trimmed payload re-resolves the auto skills, and round_trip=True keeps the full snapshot.
Test 2: ConversationState API-style serialization shrinks while persistence stays full
Step 1 — Establish baseline without the fix:
On origin/main, ran uv run python /tmp/qa_conversation_state_payload.py, which creates a real Agent, AgentContext, LocalWorkspace, and ConversationState with persistence enabled:
{
"base_state_existed_after_create": true,
"persisted_has_explicit": true,
"persisted_payload_bytes": 258728,
"persisted_skill_count": 41,
"runtime_skill_count": 41,
"state_default_payload_bytes": 265711,
"state_default_serialized_skill_count": 41
}This confirms old API-style state serialization included the full auto-loaded catalog.
Step 2 — Apply the PR's changes:
Checked out d81d7c05e257908256176fc6884a313945ac73a7.
Step 3 — Re-run with the fix in place:
Ran the same command:
{
"base_state_existed_after_create": true,
"persisted_has_explicit": true,
"persisted_payload_bytes": 258728,
"persisted_skill_count": 41,
"runtime_skill_count": 41,
"state_default_payload_bytes": 2829,
"state_default_serialized_skill_count": 1
}This confirms API-style ConversationState.model_dump() gets the payload win while persisted base_state.json still freezes the full skill snapshot.
Test 3: Existing full persisted conversations migrate to trimmed API output
Step 1 — Reproduce old persisted shape:
On origin/main, ran uv run python /tmp/qa_migration_create_base.py to create a persisted conversation with the old full serialized skill list:
{
"created_with_base_main": true,
"persisted_payload_bytes": 258728,
"persisted_skill_count": 41,
"runtime_skill_count": 41
}This establishes the migration baseline: old conversations have all resolved skills in base_state.json.
Step 2 — Apply the PR's changes:
Checked out d81d7c05e257908256176fc6884a313945ac73a7.
Step 3 — Resume the old conversation with the fix in place:
Ran uv run python /tmp/qa_migration_resume_pr.py against the persisted conversation created on main:
{
"api_style_has_explicit": true,
"api_style_payload_bytes": 582,
"api_style_serialized_skill_count": 1,
"persisted_has_explicit_after_resume": true,
"persisted_payload_bytes_after_resume": 258728,
"persisted_skill_count_after_resume": 41,
"resumed_runtime_skill_count": 41,
"runtime_agent_initial_skill_count": 41
}This confirms an old full persisted conversation resumes under the PR with the full runtime/persisted skill snapshot intact, while API-style output trims to explicit-only.
Issues Found
- 🟠 CI status: GitHub checks currently report
pre-commitandunresolved-review-threadsas failing, with some build jobs still pending. Functional QA did not investigate those CI failures because this run intentionally avoided tests, linters, formatters, and pre-commit.
This QA review was created by an AI agent (OpenHands) on behalf of the user.
Final verdict: PASS WITH ISSUES.
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable — the payload reduction is useful, but a few serialization edge cases can still lose or misrepresent the frozen skill snapshot.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟡 MEDIUM — changes AgentContext serialization/rehydration and persisted resume semantics for skills.
VERDICT:
❌ Needs follow-up before approval.
This review was created by an AI agent (OpenHands) on behalf of the user.
Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:
- Add a
.agents/skills/custom-codereview-guide.mdfile to your branch (or edit it if one already exists) with the/codereviewtrigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.- Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
- When your PR is merged, the guideline file goes through normal code review by repository maintainers.
Resolve with AI? Install the iterate skill in your agent and run
/iterateto automatically drive this PR through CI, review, and QA until it's merge-ready.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26149123203
| if loaded_agent_context is not None and agent.agent_context is not None: | ||
| agent = agent.model_copy( | ||
| update={ | ||
| "agent_context": agent.agent_context.model_copy( |
There was a problem hiding this comment.
🟠 Important: this graft only replaces the public skills field; the copied context keeps the runtime context's private _auto_loaded_skills snapshot. If the persisted snapshot is {a} and the current loader now returns {a,b}, resume correctly sets ctx.skills == {a}, but default model_dump() emits skills: [] because it compares against the runtime {a,b} snapshot. A receiver revalidating that payload then auto-loads {a,b}, so the wire payload no longer represents the resumed conversation. Copy/reset the auto-load bookkeeping from the loaded context, or otherwise force these loaded skills to serialize, when preserving the snapshot.
| # freezes the auto-loaded set against marketplace | ||
| # / user-skill churn between save and load. | ||
| continue | ||
| self.skills.append(skill) |
There was a problem hiding this comment.
🟠 Important: appending auto-loaded skills in the after-validator doesn't mark skills as set. As a result, AgentContext(load_public_skills=True).model_dump(round_trip=True, exclude_unset=True) (and the preserve_full_skills variant) omits skills entirely before the serializer can preserve the snapshot, so reloading can pick up a different current skill catalog. If round-trip/preserve dumps are meant to be lossless, mark skills as set when the validator mutates it or avoid relying on a field serializer for this path.
| if self._auto_load_config != current_config: | ||
| return handler(value) | ||
| auto = self._auto_loaded_skills | ||
| kept = [s for s in value if auto.get(s.name) != s] |
There was a problem hiding this comment.
🟠 Important: filtering to kept before delegating to handler changes Pydantic's index-based nested include/exclude semantics for skills. Once auto-loaded entries are removed, the caller's indices are applied to the compacted list, so an include/exclude targeting an auto-loaded index can incorrectly serialize or drop an explicit skill. Skip trimming when nested include/exclude targets skills, or apply the trim after serialization while preserving the original index mapping.
|
Closing in favour of #3316, which solves the same problem at the route boundary instead of inside the WhyBoth PRs deliver the same measured win (~260 KB → ~5 KB on the wire, ~42% faster cold-open for heavy conversations). But:
The work in this PR isn't lost — the test suite alone covers a lot of useful AgentContext edge cases that future model-side changes could lean on. The branch stays on the remote ( Tracking the simpler approach
Same end-to-end measured impact, much smaller surface area, fully backward-compatible. |
Closes #3301. Full motivation, profile scripts, and measured impact live there.
What this PR changes
AgentContext._load_auto_skillskeeps mutatingself.skillsexactly as today — runtime callers see the full resolved skill list. New: aPrivateAttrsnapshots which skills came from the auto-load step, and a@field_serializer("skills")drops them frommodel_dumpoutput using equality, not name match. Explicit caller-supplied skills are unaffected. Skills a consumer replaces in place viamodel_copy(update={'skills': merged})are also unaffected — see below.Safe because:
_load_auto_skillsis amodel_validator(mode="after"), so it re-runs on every deserialization and rebuilds the same auto-loaded subset from the persistedload_user_skills/load_public_skills/marketplace_pathconfiguration.test_round_trip_via_serialized_output_re_resolves_auto_skills.model_copysurvives serialization unchanged because the replacement's field values don't equal the snapshot. Verified against OpenHands'_create_agent_with_skillsflow.Migration path for existing conversations
Conversations stored under previous SDK versions have the resolved skill list inlined on
agent_context.skills. On reload through the new SDK, the validator detects skills that equal what the loader produces and marks them as auto-loaded — so the nextGETdrops them from the wire even without conversation re-creation. A persisted skill that diverged (user edited the file, marketplace updated) keeps its on-disk content and stays on the wire. Teststest_migration_stored_skills_matching_loader_are_marked_auto_loaded+test_migration_stored_skills_diverged_from_loader_stay_explicitpin both halves.Safety review across known consumers
Verified all callers that touch
AgentContext.skillsoutside this PR:openhands-sdk/workspace/remote/base.py:920—AgentContext(skills=loaded_skills, load_public_skills=False). Auto-load disabled, serializer is a no-op. Unaffected.openhands-sdk/workspace/remote/base.py:923—AgentContext(skills=[], load_public_skills=True). The exact case this PR optimises. Unaffected functionally, smaller on the wire.OpenHands/openhands/app_server/.../live_status_app_conversation_service.py:1394— constructsAgentContext(system_message_suffix=..., secrets=secrets)with NEITHERload_*_skillsflag. Auto-load short-circuits,_auto_loaded_skillsstays empty, serializer is a no-op.OpenHands/openhands/app_server/.../app_conversation_service_base.py:159-185(_create_agent_with_skills) — merges sandbox + repo skills withagent_context.skills, thenmodel_copy(update={'skills': merged}). If the merge replaces an auto-loaded skill with a different version under the same name, the equality-check serializer keeps the replacement. Covered bytest_replacing_auto_skill_via_model_copy_survives_serialization.createAgentFromSettings— seedsagent_context: {load_public_skills: true, load_user_skills: true}on every conversation create. Triggers the optimisation path.grepconfirms no other consumer in the SDK or inOpenHands/setsload_user_skills/load_public_skills.Why
Per-conversation
GETwas carrying the entire resolved skill catalog (~263 KB in stock setups). That bloat was paid by every consumer that does a conversation fetch — agent-canvas's cold-open profile attributes 1190 ms of perceived delay to the JSON.parse + structural-share + commit of that payload on the browser main thread. See #3301 for the empirical breakdown.Measured impact
End-to-end against the live agent-server on
127.0.0.1:18000, same browser, same conversations, swapping only the SDK binary (uvx --with-editableon a worktree of this branch):GET /api/conversationsfor 40-skill convMigration auto-fired on a conversation created under the old SDK — no re-creation needed.
Test plan
tests/sdk/context/test_agent_context_skills_serialization.py:model_dumpmodel_dumpmodel_validateround-trip rebuilds the auto-loaded subsettest_replacing_auto_skill_via_model_copy_survives_serializationpytest tests/sdk/context/ tests/cross/test_agent_secrets_integration.py— 296 passed (full regression for the changed module + adjacent integration).ruff format+ruff check+pyrighton the changed files — clean.258 KB → 4.6 KBpayload,2226 ms → 1283 msfirst-event-paint on the heavy conversation, exactly matching the prediction.🤖 Generated with Claude Code
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:d81d7c0-pythonRun
All tags pushed for this build
About Multi-Architecture Support
d81d7c0-python) is a multi-arch manifest supporting both amd64 and arm64d81d7c0-python-amd64) are also available if needed