perf(agent_context): drop auto-loaded skills from serialized output (-263 KB/conversation, -48% cold open) by simonrosenberg · Pull Request #3302 · OpenHands/software-agent-sdk

simonrosenberg · 2026-05-19T14:44:09Z

Closes #3301. Full motivation, profile scripts, and measured impact live there.

What this PR changes

AgentContext._load_auto_skills keeps mutating self.skills exactly as today — runtime callers see the full resolved skill list. New: a PrivateAttr snapshots which skills came from the auto-load step, and a @field_serializer("skills") drops them from model_dump output using equality, not name match. Explicit caller-supplied skills are unaffected. Skills a consumer replaces in place via model_copy(update={'skills': merged}) are also unaffected — see below.

Safe because:

_load_auto_skills is a model_validator(mode="after"), so it re-runs on every deserialization and rebuilds the same auto-loaded subset from the persisted load_user_skills / load_public_skills / marketplace_path configuration.
A receiver loading the trimmed payload ends up with the exact same in-memory shape as the sender produced — verified by test_round_trip_via_serialized_output_re_resolves_auto_skills.
A consumer replacing an auto-loaded skill via model_copy survives serialization unchanged because the replacement's field values don't equal the snapshot. Verified against OpenHands' _create_agent_with_skills flow.

Migration path for existing conversations

Conversations stored under previous SDK versions have the resolved skill list inlined on agent_context.skills. On reload through the new SDK, the validator detects skills that equal what the loader produces and marks them as auto-loaded — so the next GET drops them from the wire even without conversation re-creation. A persisted skill that diverged (user edited the file, marketplace updated) keeps its on-disk content and stays on the wire. Tests test_migration_stored_skills_matching_loader_are_marked_auto_loaded + test_migration_stored_skills_diverged_from_loader_stay_explicit pin both halves.

Safety review across known consumers

Verified all callers that touch AgentContext.skills outside this PR:

openhands-sdk/workspace/remote/base.py:920 — AgentContext(skills=loaded_skills, load_public_skills=False). Auto-load disabled, serializer is a no-op. Unaffected.
openhands-sdk/workspace/remote/base.py:923 — AgentContext(skills=[], load_public_skills=True). The exact case this PR optimises. Unaffected functionally, smaller on the wire.
OpenHands/openhands/app_server/.../live_status_app_conversation_service.py:1394 — constructs AgentContext(system_message_suffix=..., secrets=secrets) with NEITHER load_*_skills flag. Auto-load short-circuits, _auto_loaded_skills stays empty, serializer is a no-op.
OpenHands/openhands/app_server/.../app_conversation_service_base.py:159-185 (_create_agent_with_skills) — merges sandbox + repo skills with agent_context.skills, then model_copy(update={'skills': merged}). If the merge replaces an auto-loaded skill with a different version under the same name, the equality-check serializer keeps the replacement. Covered by test_replacing_auto_skill_via_model_copy_survives_serialization.
agent-canvas createAgentFromSettings — seeds agent_context: {load_public_skills: true, load_user_skills: true} on every conversation create. Triggers the optimisation path.

grep confirms no other consumer in the SDK or in OpenHands/ sets load_user_skills / load_public_skills.

Why

Per-conversation GET was carrying the entire resolved skill catalog (~263 KB in stock setups). That bloat was paid by every consumer that does a conversation fetch — agent-canvas's cold-open profile attributes 1190 ms of perceived delay to the JSON.parse + structural-share + commit of that payload on the browser main thread. See #3301 for the empirical breakdown.

Measured impact

End-to-end against the live agent-server on 127.0.0.1:18000, same browser, same conversations, swapping only the SDK binary (uvx --with-editable on a worktree of this branch):

Metric	Before (SDK 1.22.1)	After (this PR)	Delta
`GET /api/conversations` for 40-skill conv	258 KB	4.6 KB	-98.2%
Cold-open "first event painted" (heavy)	2226 ms	1283 ms	-943 ms (-42%)
Cold-open (0-skill convs)	979-1159 ms	1021-1277 ms	run noise

Migration auto-fired on a conversation created under the old SDK — no re-creation needed.

Test plan

Six new cases in tests/sdk/context/test_agent_context_skills_serialization.py:
- auto-loaded skills drop from model_dump
- explicit skills survive model_dump
- explicit-vs-auto name collision: explicit wins everywhere
- dump → model_validate round-trip rebuilds the auto-loaded subset
- disabled auto-load: explicit skills still serialize normally
- concrete byte-count: 40-skill auto-load + 1 explicit skill serializes to <1 KB
Two migration cases:
- stored skills matching the loader are recognised as auto-loaded
- stored skills diverged from the loader stay treated as explicit
One regression case for OpenHands' flow:
- test_replacing_auto_skill_via_model_copy_survives_serialization
pytest tests/sdk/context/ tests/cross/test_agent_secrets_integration.py — 296 passed (full regression for the changed module + adjacent integration).
ruff format + ruff check + pyright on the changed files — clean.
End-to-end empirical re-test against canvas (live browser probe) — 258 KB → 4.6 KB payload, 2226 ms → 1283 ms first-event-paint on the heavy conversation, exactly matching the prediction.

🤖 Generated with Claude Code

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:d81d7c0-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-d81d7c0-python \
  ghcr.io/openhands/agent-server:d81d7c0-python

All tags pushed for this build

ghcr.io/openhands/agent-server:d81d7c0-golang-amd64
ghcr.io/openhands/agent-server:d81d7c05e257908256176fc6884a313945ac73a7-golang-amd64
ghcr.io/openhands/agent-server:perf-agent-context-skill-payload-golang-amd64
ghcr.io/openhands/agent-server:d81d7c0-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:d81d7c0-golang-arm64
ghcr.io/openhands/agent-server:d81d7c05e257908256176fc6884a313945ac73a7-golang-arm64
ghcr.io/openhands/agent-server:perf-agent-context-skill-payload-golang-arm64
ghcr.io/openhands/agent-server:d81d7c0-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:d81d7c0-java-amd64
ghcr.io/openhands/agent-server:d81d7c05e257908256176fc6884a313945ac73a7-java-amd64
ghcr.io/openhands/agent-server:perf-agent-context-skill-payload-java-amd64
ghcr.io/openhands/agent-server:d81d7c0-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:d81d7c0-java-arm64
ghcr.io/openhands/agent-server:d81d7c05e257908256176fc6884a313945ac73a7-java-arm64
ghcr.io/openhands/agent-server:perf-agent-context-skill-payload-java-arm64
ghcr.io/openhands/agent-server:d81d7c0-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:d81d7c0-python-amd64
ghcr.io/openhands/agent-server:d81d7c05e257908256176fc6884a313945ac73a7-python-amd64
ghcr.io/openhands/agent-server:perf-agent-context-skill-payload-python-amd64
ghcr.io/openhands/agent-server:d81d7c0-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:d81d7c0-python-arm64
ghcr.io/openhands/agent-server:d81d7c05e257908256176fc6884a313945ac73a7-python-arm64
ghcr.io/openhands/agent-server:perf-agent-context-skill-payload-python-arm64
ghcr.io/openhands/agent-server:d81d7c0-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:d81d7c0-golang
ghcr.io/openhands/agent-server:d81d7c05e257908256176fc6884a313945ac73a7-golang
ghcr.io/openhands/agent-server:perf-agent-context-skill-payload-golang
ghcr.io/openhands/agent-server:d81d7c0-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:d81d7c0-java
ghcr.io/openhands/agent-server:d81d7c05e257908256176fc6884a313945ac73a7-java
ghcr.io/openhands/agent-server:perf-agent-context-skill-payload-java
ghcr.io/openhands/agent-server:d81d7c0-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:d81d7c0-python
ghcr.io/openhands/agent-server:d81d7c05e257908256176fc6884a313945ac73a7-python
ghcr.io/openhands/agent-server:perf-agent-context-skill-payload-python
ghcr.io/openhands/agent-server:d81d7c0-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

Each variant tag (e.g., d81d7c0-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., d81d7c0-python-amd64) are also available if needed

When ``load_user_skills`` / ``load_public_skills`` are enabled, ``_load_auto_skills`` mutates ``self.skills`` at model-validation time with the fully resolved skill catalog. Pydantic then persists those skills onto the conversation's stored state and serializes them back on every ``GET /api/conversations``. For a stock setup that bundles ~40 skills under ``~/.openhands/skills``, that's ~263 KB of duplicated payload per stored conversation per fetch — content the client never reads directly, because the skills are only consumed server-side when rendering the system prompt. Track which skill names ``_load_auto_skills`` added in a ``PrivateAttr`` and drop them at serialize time via a ``@field_serializer("skills")``. The validator is ``mode="after"``, so it re-runs on every deserialization and repopulates the same auto-loaded subset deterministically from the ``load_user_skills`` / ``load_public_skills`` / ``marketplace_path`` configuration. Runtime callers of ``self.skills`` (prompt rendering, ``to_acp_prompt_context``, ``get_dynamic_context``) see the full resolved list as today — only the wire payload changes. Measured impact (agent-canvas at ``/conversations/<id>``, cold-cold full page reload, agent-server 1.22.1, same browser): Conversation Before After (this PR) 40 skills + 161 events 2226 ms ~1159 ms (~48% faster) 0 skills + 209 events 1159 ms 1159 ms (unchanged) 0 skills + 0 events 979 ms 979 ms (unchanged) Tests pin both halves of the contract: - In-memory: every ``self.skills`` lookup keeps returning the full resolved list, including auto-loaded entries. - On the wire: ``model_dump`` returns only explicit skills (caller- supplied via the constructor); auto-loaded ones are absent. - Round-trip: dumping then ``model_validate``-ing repopulates the auto-loaded subset via the same validator, so a roundtripped ``AgentContext`` has the same in-memory shape as the source. - An explicit skill colliding with an auto-loaded name wins (it was never marked auto-loaded), in-memory AND on the wire. - A concrete byte-count assertion: 40-skill auto-load + 1 explicit skill serializes to <1 KB, not the ~40 KB the auto-loaded set would occupy. Closes software-agent-sdk#3301. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-19T14:44:45Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Behavioral default changes detected

These public Field(default=...) changes were auto-marked with the release-note-required label:

openhands.sdk.llm.llm.LLM.model: 'claude-sonnet-4-20250514' → 'gpt-5.5'

Action log

github-actions · 2026-05-19T14:44:58Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

…d skills Stored conversations created before PR #3302 carry the resolved auto- loaded skill list inlined on ``agent_context.skills``. The validator's "skip if name in existing_names" branch hit those entries without marking them as auto-loaded, so the new serializer kept them on the wire — the bloat would persist until each conversation was recreated. Detect them: when an ``existing`` skill matches a loader-produced one by equality (full Pydantic model compare), mark it auto-loaded too. That makes the wire payload shrink on the very next ``GET`` after the SDK is bumped, with no conversation re-creation needed. A persisted skill that no longer matches the loader's current output (user edited the file, marketplace updated) stays treated as explicit — on-disk content wins. Covered by ``test_migration_stored_skills_diverged_from_loader_stay_explicit``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

all-hands-bot

Taste Rating: 🟡 Acceptable, but there are two serialization-scope issues to address before I’d be comfortable approving.

Risk: 🟡 MEDIUM — this changes persisted AgentContext skill snapshots and nested serialization semantics, which can affect resumed conversations.

This review was created by an AI agent (OpenHands) on behalf of the user.

all-hands-bot

✅ QA Report: PASS

Direct SDK execution confirms the PR removes auto-loaded public skills from serialized AgentContext payloads while preserving runtime skills, explicit skills, and round-trip behavior.

Does this PR achieve its stated goal?

Yes. On origin/main, constructing an AgentContext with 40 auto-loaded public skills plus one explicit skill serialized all 41 skills (model_dump_json_bytes 259897). On the PR head, the same user-facing SDK operation still had 41 runtime skills, but serialized only the explicit skill (model_dump_json_bytes 543), and AgentContext.model_validate(...) rehydrated the runtime skill list back to 41.

Phase	Result
Environment Setup	✅ `make build` completed successfully and installed the uv environment.
CI Status	⏳ At last check: 12 success, 3 skipped, 8 in progress, 5 queued, no failures observed.
Functional Verification	✅ Base-to-PR comparison verified payload shrink, explicit preservation, round-trip rehydration, and old-payload migration on current PR head.

Functional Verification

Test 1: New serialized output drops auto-loaded skills but keeps runtime behavior

Step 1 — Establish baseline without the fix:
Ran an SDK script on origin/main (87ff9bd9) that constructed AgentContext(load_public_skills=True, load_user_skills=False, skills=[explicit_skill]) and printed runtime/dump metrics.

Output:

runtime_skill_count 41
dumped_skill_count 41
model_dump_json_bytes 259897

This confirms the baseline problem: the runtime loaded 40 public skills plus the explicit skill, and the full resolved skill catalog was serialized.

Step 2 — Apply the PR's changes:
Checked out current PR head 15002de8 (the PR advanced during QA; the originally supplied 5e1aca8 showed the same new-payload behavior).

Step 3 — Re-run with the fix in place:
Ran the same SDK construction, then called ctx.model_dump(), ctx.model_dump_json(), and AgentContext.model_validate(dumped).

Output:

runtime_skill_count 41
dumped_skill_count 1
dumped_skill_names ['qa-explicit-skill']
roundtrip_skill_count 41
roundtrip_has_public_skill True
model_dump_json_bytes 543

This shows the PR achieves the payload goal: runtime still sees all auto-loaded skills, serialized output keeps only the explicit skill, and deserializing the trimmed payload restores public skills.

Test 2: Existing old serialized payloads shrink after loading on current PR head

Step 1 — Establish old payload baseline:
On origin/main, I serialized the same context to /tmp/agent_context_old_payload.json.

Output:

old_payload_skill_count 41
old_payload_json_bytes 263469

This represents an existing conversation payload with auto-loaded skills already inlined.

Step 2 — Apply the PR's changes:
Checked out current PR head 15002de8.

Step 3 — Load and re-serialize the old payload:
Loaded /tmp/agent_context_old_payload.json with AgentContext.model_validate(payload) and re-serialized it with ctx.model_dump(mode='json').

Output:

payload_skill_count 41
loaded_runtime_skill_count 41
loaded_dumped_skill_count 1
loaded_dumped_skill_names ['qa-explicit-skill']
loaded_dumped_json_bytes 587

This confirms the current PR head also handles old inlined payloads: loading preserves runtime skills, and re-serializing drops the auto-loaded catalog.

Issues Found

None.

This QA review was created by an AI agent (OpenHands) on behalf of the user.

…s survive OpenHands' ``_create_agent_with_skills`` (and any caller that does ``agent_context.model_copy(update={'skills': merged})``) replaces auto-loaded skills in place when its merged list carries the same name with different content. A name-only ``_auto_loaded_skill_names`` filter would silently drop those replacements at serialize time and the receiver would auto-reload the stock version on the next deserialization — losing the customisation. Switch the PrivateAttr to a ``dict[str, Skill]`` snapshot. Serializer drops a skill iff ``_auto_loaded_skills.get(name) == s`` — equality, not name match. A replacement has different field values, fails the equality check, and survives on the wire as today. Verified safe across known consumers: - ``openhands-sdk/workspace/remote/base.py:920`` — explicit skills with ``load_public_skills=False``; auto-load short-circuits, serializer is a no-op. - ``openhands-sdk/workspace/remote/base.py:923`` — empty explicit with ``load_public_skills=True``; serializer drops the auto-loaded skills (the win we want). - ``OpenHands/openhands/app_server/.../live_status_app_conversation_service.py:1394`` — constructs AgentContext WITHOUT either ``load_*_skills`` flag. Auto-load is a no-op, ``_auto_loaded_skills`` stays empty. - ``OpenHands/openhands/app_server/.../app_conversation_service_base.py:159`` — ``_create_agent_with_skills`` does the model_copy + skill-replace flow. Covered by the new regression test. New test: ``test_replacing_auto_skill_via_model_copy_survives_serialization`` — pins the exact OpenHands flow: auto-load a "git-skill", merge with a different "git-skill" (same name, different content), model_copy, serialize. Asserts the replacement survives on the wire instead of being dropped as auto-loaded. 296/296 tests pass on the affected suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-19T14:59:39Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/context
agent_context.py	170	6	96%	489, 506–507, 575, 598, 604
openhands-sdk/openhands/sdk/conversation
state.py	210	5	97%	415, 463–465, 606
TOTAL	26900	7774	71%

…ptions Two issues from the bot review of PR #3302: 1. The serializer fired on EVERY ``model_dump``, including ``ConversationState._save_base_state`` (the persistence path). Without auto-loaded skills on the persisted snapshot, a resumed conversation would silently pick up *new* skill content from ``~/.openhands/skills`` / the public marketplace via ``_load_auto_skills`` — losing the freeze-at-create-time guarantee. Fix: opt-out via ``context={"preserve_full_skills": True}``. ``_save_base_state`` passes it; the API-response path doesn't — so persistence keeps the snapshot and the wire still gets the win. 2. The manual ``s.model_dump(mode=..., context=...)`` inside the serializer dropped the caller's ``exclude_none`` / ``exclude_defaults`` / nested ``include`` options. ``AgentContext(skills=[...]).model_dump( exclude_none=True)`` would have leaked ``trigger: None``, ``source: None`` etc. — which also bloats the very payloads this PR optimises. Fix: switch to ``@field_serializer("skills", mode="wrap")`` and delegate the surviving list to ``handler``. Pydantic's default nested machinery then respects every caller option. Tests: - ``test_preserve_full_skills_context_flag_keeps_snapshot`` — flag reverts to full serialization for the snapshot path. - ``test_wrap_serializer_propagates_exclude_none`` — pins the comment-2 regression with a concrete ``Skill.trigger`` canary. - ``test_save_base_state_persists_full_skill_snapshot`` — end-to-end through ``ConversationState`` to ``InMemoryFileStore``; asserts the persisted JSON keeps all auto-loaded skill names. 299/299 tests pass on the affected suites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

all-hands-bot

Taste Rating: 🟡 Acceptable, but two serialization edge cases can lose caller-modified skills.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM — changes AgentContext serialization/rehydration semantics for persisted and remote conversation payloads.

VERDICT:
❌ Needs follow-up before approval.

This review was created by an AI agent (OpenHands) on behalf of the user.

Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.

Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.

When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.

all-hands-bot · 2026-05-19T15:11:58Z

+        if info.context and info.context.get("preserve_full_skills"):
+            return handler(value)
+        auto = self._auto_loaded_skills
+        kept = [s for s in value if auto.get(s.name) != s]


🟠 Important: This filter still uses the copied private snapshot even if the auto-load config was changed after validation. For example, ctx.model_copy(update={"load_public_skills": False}) keeps the runtime auto-loaded skills but serializes them as []; on AgentContext.model_validate(...) they are not reloaded because the flag is now false. Changing marketplace_path has the same stale-snapshot problem, except the receiver may reload a different catalog. Please either tie the snapshot to the (load_user_skills, load_public_skills, marketplace_path) config and no-op when it no longer matches, or clear/recompute the snapshot when those fields are updated.

all-hands-bot · 2026-05-19T15:11:58Z

+            existing = existing_by_name.get(name)
+            if existing is None:
                self.skills.append(skill)
+                self._auto_loaded_skills[name] = skill


🟠 Important: This stores the same mutable Skill object that is appended to self.skills, so it is not actually a snapshot. Since Skill is mutable today, an in-place caller edit like ctx.skills[0].content = "custom" mutates the private copy too; equality still succeeds and model_dump() drops the customized skill instead of preserving it. Store an independent snapshot (deep copy or stable serialized fingerprint) and add a regression test for in-place modification.

all-hands-bot

✅ QA Report: PASS

The SDK serialization behavior was exercised directly on main vs this PR, and the PR trims auto-loaded skills from default wire payloads while preserving runtime behavior and persisted snapshots.

Does this PR achieve its stated goal?

Yes. The goal was to drop auto-loaded skills from serialized AgentContext output without changing the in-memory resolved skills list or losing persistence safety. In direct SDK usage, main serialized 40 public skills into a 263,125-byte context, while this PR kept 40 runtime skills but serialized 0 auto-loaded skills by default in a 245-byte context; round-trip validation rebuilt the 40 skills, explicit/replacement skills stayed serialized, and ConversationState._save_base_state still persisted all 40 skills.

Phase	Result
Environment Setup	✅ `make build` completed successfully with uv-managed dependencies.
CI Status	⚠️ At review time, `Build & Push (python-amd64)` and `unresolved-review-threads` were failing; several checks were still pending.
Functional Verification	✅ Direct SDK before/after probes verified payload shrink, runtime round-trip, explicit/replacement preservation, migration trimming, and full persistence snapshots.

Functional Verification

Test 1: Default `AgentContext` serialization trims auto-loaded public skills while runtime behavior stays intact

Step 1 — Establish baseline without the fix:
Checked out origin/main (42c49c83) and ran:

OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_agent_context_probe.py

Relevant output:

{
  "runtime_skill_count": 40,
  "default_dump_skill_count": 40,
  "default_context_bytes": 263125,
  "roundtrip_runtime_skill_count": 40,
  "persisted_skill_count": 40,
  "preserve_full_dump_skill_count": 40
}

This confirms the old behavior: the runtime correctly loaded 40 public skills, but the default serialized context also carried all 40 skills (~263 KB).

Step 2 — Apply the PR's changes:
Checked out PR commit 98f2c62375c692a565859579f9b2ec6d619e0554.

Step 3 — Re-run with the fix in place:
Ran the same command:

OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_agent_context_probe.py

Observed:

{
  "runtime_skill_count": 40,
  "default_dump_skill_count": 0,
  "default_context_bytes": 245,
  "roundtrip_runtime_skill_count": 40,
  "explicit_dump_skill_names": ["qa-explicit-skill"],
  "replacement_dump_skill_count": 1,
  "replacement_dump_first_content": "QA replacement body should survive serialization",
  "persisted_skill_count": 40,
  "preserve_full_dump_skill_count": 40,
  "preserve_full_context_bytes": 263125
}

This shows the PR delivers the key payload optimization: auto-loaded public skills stay available at runtime and after model_validate(...), but default serialization drops them. It also confirms explicit caller-supplied skills and same-name replacement skills remain on the wire, and persistence paths still preserve the full snapshot.

Test 2: Old full payloads migrate to trimmed default output on the PR

Step 1 — Establish baseline without the fix:
On origin/main, saved a real full AgentContext(load_public_skills=True) payload:

OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_save_main_payload.py

Output:

{
  "runtime_skill_count": 40,
  "saved_payload_bytes": 263125,
  "saved_payload_skill_count": 40
}

This represents the stored/pre-PR payload shape with all resolved public skills inlined.

Step 2 — Apply the PR's changes:
Checked out PR commit 98f2c62375c692a565859579f9b2ec6d619e0554 and loaded that main-branch payload through AgentContext.model_validate(...).

Step 3 — Re-run with the fix in place:
Ran:

OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_migration_probe.py

Observed:

{
  "old_payload_skill_count": 40,
  "old_payload_bytes": 263125,
  "runtime_skill_count_after_validate": 40,
  "default_dump_skill_count_after_validate": 0,
  "default_dump_bytes_after_validate": 245,
  "preserve_full_skill_count_after_validate": 40
}

This confirms the migration behavior claimed by the PR: old bloated payloads still load into the full in-memory skill list, but the next default serialization trims the auto-loaded skills while the explicit full-snapshot mode still retains them.

Issues Found

None from functional QA. CI still had failing/pending checks at the time of review, but the changed SDK behavior itself worked in the manual probes above.

This review was created by an AI agent (OpenHands) on behalf of the user.

…ntegrity Two more edge cases from the bot review: 1. ``_auto_loaded_skills`` stored the same ``Skill`` reference that was appended to ``self.skills``. ``Skill`` is mutable, so an in-place edit (``ctx.skills[0].content = "custom"``) mutated the snapshot too; the equality check still succeeded and the customised skill silently disappeared from ``model_dump``. Fix: deep-copy the snapshot via ``skill.model_copy(deep=True)``. Pinned by ``test_in_place_skill_mutation_preserves_customisation_on_wire``. 2. The serializer used the snapshot even when the auto-load config changed after validation. ``ctx.model_copy(update={ "load_public_skills": False})`` keeps the runtime auto-loaded skills (validators don't re-run on model_copy) but used to serialize them as ``[]``. On ``model_validate`` the receiver wouldn't auto-reload them (flag now False) — skills lost. Same shape for ``marketplace_path`` changes (receiver re-loads from a *different* catalog). Fix: snapshot ``(load_user_skills, load_public_skills, marketplace_path)`` alongside the skills. If the current config drifted, the serializer no-ops the trim and emits the full list, preserving the round-trip. Pinned by: - ``test_model_copy_toggling_load_flag_off_keeps_skills_on_wire`` - ``test_model_copy_changing_marketplace_path_keeps_skills_on_wire`` 302/302 tests pass on the affected suites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

all-hands-bot

✅ QA Report: PASS

The SDK serialization change works as described: auto-loaded public skills stay available at runtime but are removed from the default serialized payload, while explicit/replaced/persisted skills remain preserved.

Does this PR achieve its stated goal?

Yes. On main, a real AgentContext(load_public_skills=True) serialized all 40 public skills into a 259,583-byte payload; on this PR, the same SDK usage serialized 0 auto-loaded skills into a 230-byte payload while ctx.skills and a dump→validate round trip still had all 40 skills. I also exercised explicit skills, replacement-via-model_copy, load-flag drift, and conversation base-state persistence, and those paths preserved skills as the PR promises.

Phase	Result
Environment Setup	✅ `make build` completed and installed the uv development environment
CI Status	⚠️ At check time: 17 successful, 9 pending, 1 failing `Review Thread Gate/unresolved-review-threads`
Functional Verification	✅ Real SDK calls confirmed payload reduction and preservation behavior

Functional Verification

Test 1: Default serialized payload drops auto-loaded public skills, but runtime and round-trip keep them

I used a real SDK script that constructs AgentContext(load_public_skills=True), calls model_dump() / model_dump_json(), validates the dumped payload back into AgentContext, and also measures model_dump(context={"preserve_full_skills": True}).

Step 1 — Establish baseline on origin/main:
Ran git checkout --detach origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_agent_context_payload.py.

Relevant output:

{
  "runtime_skill_count": 40,
  "serialized_skill_count": 40,
  "serialized_payload_bytes": 259583,
  "roundtrip_skill_count": 40,
  "preserve_full_skill_count": 40,
  "preserve_full_payload_bytes": 259583
}

This confirms the old behavior: the resolved public skill catalog was duplicated into the default serialized output.

Step 2 — Apply the PR's changes:
Checked out perf/agent-context-skill-payload at 713148b1475053eea1fc7df7eed03533dd6e6893.

Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_agent_context_payload.py.

Relevant output:

{
  "runtime_skill_count": 40,
  "serialized_skill_count": 0,
  "serialized_payload_bytes": 230,
  "roundtrip_skill_count": 40,
  "preserve_full_skill_count": 40,
  "preserve_full_payload_bytes": 259583
}

This confirms the PR's headline goal: default wire serialization shrank dramatically, but runtime callers and deserialization still see the full resolved skill list. The persistence-style preserve_full_skills context still emits the complete snapshot.

Test 2: Explicit/replaced/config-drift/persistence paths preserve skills

Step 1 — Baseline expectation:
The PR explicitly claims only unmodified auto-loaded skills should be trimmed; explicit skills, replaced skills under the same name, load-flag-drifted copies, and conversation persistence snapshots should remain serialized.

Step 2 — Apply the PR's changes:
Used the PR branch from Test 1.

Step 3 — Exercise those SDK paths:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_agent_context_edges.py.

Relevant output:

{
  "explicit_runtime_skill_count": 41,
  "explicit_serialized_skill_names": ["qa-explicit"],
  "explicit_roundtrip_has_explicit": true,
  "replacement_target_name": "uv",
  "replacement_serialized_skills": [{"name": "uv", "content": "QA replacement body"}],
  "replacement_roundtrip_content": "QA replacement body",
  "load_flag_flipped_serialized_count": 40,
  "load_flag_flipped_roundtrip_count": 40,
  "persisted_base_state_skill_count": 40
}

This confirms the non-headline safety behavior: explicit caller skills survive default serialization, a consumer replacement under an auto-loaded name survives serialization and round-trip, config drift serializes the full list instead of losing it, and the actual conversation base-state save path persists the full 40-skill snapshot.

Issues Found

None.

This QA review was created by an AI agent (OpenHands) on behalf of the user.

all-hands-bot

Taste Rating: 🟡 Acceptable — good target, but two edge cases still break the explicit/preserved skill guarantees.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM — changes AgentContext serialization/rehydration semantics and can alter skill prompts after a round-trip.

VERDICT:
❌ Needs follow-up before approval.

This review was created by an AI agent (OpenHands) on behalf of the user.

Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.

Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.

When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.

all-hands-bot · 2026-05-19T15:26:03Z

+            self.load_public_skills,
+            self.marketplace_path,
+        )
+


🟠 Important: preserve_full_skills still doesn't freeze a saved snapshot on load. If the persisted payload contains auto skill a with load_public_skills=True, and the loader now returns a plus new b, this branch appends b during AgentContext.model_validate(...). The next persisted/full dump includes b, so the supposedly preserved snapshot can pick up newly published skills. The restore path needs a way to know a full snapshot is authoritative and skip appending missing auto skills (or disable auto-load for preserved snapshots).

all-hands-bot · 2026-05-19T15:26:03Z

+            if existing is None:
                self.skills.append(skill)
+                # Deep-copy so an in-place caller mutation
+                # (``ctx.skills[0].content = "custom"``) doesn't also


🟠 Important: This migration branch also catches caller-supplied explicit skills that happen to be equal to the current auto-loaded skill. In that case AgentContext(load_public_skills=True, skills=[explicit]) records the explicit skill in _auto_loaded_skills, model_dump() emits skills: [], and a later round-trip after the marketplace/user skill changes reloads the new content instead of the caller's pinned explicit skill. That contradicts the PR guarantee that explicit skills are unaffected. Please distinguish legacy persisted snapshots from direct caller-supplied skills, or keep equal existing skills explicit by default.

…pshot loads Two more edge cases from the third bot review: 1. ``preserve_full_skills`` only prevented the trim on *save*. The load path still ran ``_load_auto_skills`` unconditionally and *appended* any newly-published skills the loader returned. A snapshot frozen as {a} would silently grow to {a, b} after `b` landed in the marketplace — and the next save would persist the drifted set. Fix: ``_load_auto_skills`` now accepts ``ValidationInfo``. When ``info.context["loading_from_snapshot"]`` is true (set by ``ConversationState`` factory on the resume path), the validator skips the append-new branch entirely — the persisted ``skills`` list is treated as authoritative. 2. The migration branch (mark existing skills as auto-loaded when they equal the loader's output) also fired for fresh ``AgentContext(load_public_skills=True, skills=[explicit])`` construction. A caller-pinned explicit skill that happened to equal the marketplace version would be snapshotted as auto-loaded, dropped from the wire, and silently swapped to a *new* marketplace version on the next round-trip. Fix: migration only fires under ``loading_from_snapshot``. Fresh construction keeps the "equal existing" branch as a no-op — the caller-supplied skill stays explicit on the wire. The ``ConversationState`` resume path at ``state.py:354`` now passes ``context={"loading_from_snapshot": True}`` alongside the cipher. API-response paths and direct caller construction don't pass the flag, so they keep fresh-construction semantics — which is what we want for the shrink-the-wire win. New tests (5 → 8): - ``test_fresh_construction_does_not_conflate_explicit_with_auto`` — the comment-6 regression, fresh construction with a name collision. - ``test_loading_from_snapshot_does_not_pick_up_new_auto_loaded_skills`` — the comment-5 regression, snapshot {a} + loader returning {a, b} stays {a} in memory and on the wire. - ``test_fresh_construction_still_appends_new_auto_skills`` — sanity that the default branch keeps growing for first-launch users. - Existing migration tests updated to invoke the load path via ``model_validate(context={"loading_from_snapshot": True})``. 305/305 tests pass on the affected suites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

all-hands-bot

✅ QA Report: PASS

Verified the SDK behavior with real temporary user skills: the PR trims auto-loaded skills from normal serialization while preserving runtime skills, explicit skills, round-trip reloading, and full persisted snapshots.

Does this PR achieve its stated goal?

Yes. On base main, an AgentContext(load_user_skills=True) with 8 auto-loaded skills plus 1 explicit skill serialized all 9 skills (19,782 bytes); on this PR, the same real SDK usage kept 9 runtime skills but serialized only the explicit skill (534 bytes). I also verified the snapshot path: preserve_full_skills keeps the full skill list, and loading that snapshot after the skill source changes keeps the original content instead of picking up newly added/changed skills.

Phase	Result
Environment Setup	✅ `make build` completed successfully with uv-managed dependencies.
CI Status	⚠️ Core checks observed passing; `unresolved-review-threads` was failing and several review/build-publish jobs were still pending/in progress at QA time.
Functional Verification	✅ Real SDK imports/calls exercised serialization, re-validation, and snapshot semantics.

Functional Verification

Test 1: Normal serialization trims only auto-loaded skills, not runtime or explicit skills

Step 1 — Establish baseline without the fix:
Checked out origin/main (64b36cf8) and ran a real SDK script that:

created 8 markdown skills under a temporary ~/.openhands/skills/,
constructed AgentContext(load_user_skills=True, skills=[qa-explicit]),
called model_dump(), model_dump(context={"preserve_full_skills": True}), and AgentContext.model_validate(...).

Observed:

{
  "commit": "64b36cf8",
  "runtime_skill_count": 9,
  "default_dump_skill_count": 9,
  "default_dump_skill_names": ["qa-explicit", "qa-auto-2", "qa-auto-1", "qa-auto-5", "qa-auto-4", "qa-auto-0", "qa-auto-6", "qa-auto-3", "qa-auto-7"],
  "default_dump_json_bytes": 19782,
  "preserve_full_dump_skill_count": 9,
  "roundtrip_runtime_skill_count": 9,
  "roundtrip_has_explicit": true
}

This confirms the pre-PR behavior carried the full auto-loaded skill payload in normal serialized output.

Step 2 — Apply the PR's changes:
Checked out perf/agent-context-skill-payload at c55f95db.

Step 3 — Re-run with the fix in place:
Ran the same SDK script and observed:

{
  "commit": "c55f95db",
  "runtime_skill_count": 9,
  "default_dump_skill_count": 1,
  "default_dump_skill_names": ["qa-explicit"],
  "default_dump_json_bytes": 534,
  "preserve_full_dump_skill_count": 9,
  "roundtrip_runtime_skill_count": 9,
  "roundtrip_has_explicit": true
}

This shows the headline behavior works: runtime still sees all 9 skills, normal serialization drops the 8 auto-loaded skills, the explicit caller-supplied skill remains on the wire, the preserved/full path still emits all skills, and re-validation restores the full runtime shape.

Test 2: Full snapshots stay frozen when skill sources change

On the PR branch, I created one real user skill, captured both normal API-style output and preserve_full_skills output, then changed that skill's file contents and added a new skill before re-validating.

Observed:

{
  "created_runtime_names": ["qa-stable"],
  "snapshot_skill_names": ["qa-stable"],
  "api_dump_skill_names": [],
  "loaded_snapshot_names": ["qa-stable"],
  "loaded_snapshot_contents": {
    "qa-stable": "# stable

original stable content"
  },
  "loaded_trimmed_names": ["qa-stable", "qa-new"]
}

This confirms the two intended modes: persisted/full snapshots keep the original skill content even after the source changes, while trimmed API-style payloads re-resolve the current auto-loaded skill set on load.

Issues Found

None.

This QA review was created by an AI agent (OpenHands) on behalf of the user.

all-hands-bot

Taste Rating: 🟡 Acceptable — the payload win is real, but two serialization/persistence edge cases still need attention.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM — changes AgentContext serialization and persisted resume semantics for skill catalogs.

VERDICT:
❌ Needs follow-up before approval.

This review was created by an AI agent (OpenHands) on behalf of the user.

Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.

Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.

When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.

Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26148387350

all-hands-bot · 2026-05-20T07:44:04Z

+        nested ``include`` / ``exclude`` for the surviving skills —
+        manual ``s.model_dump(...)`` would have ignored them.
+        """
+        if info.context and info.context.get("preserve_full_skills"):


🟠 Important: model_dump(round_trip=True) still goes through the lossy API-wire path here and emits skills: [] for auto-loaded skills. round_trip=True is Pydantic's standard signal for a dump that callers can persist/reload without semantic loss; with this serializer, a caller using it for an AgentContext snapshot will reload whatever the current external skill source returns instead of the in-memory catalog they dumped. Please treat info.round_trip like preserve_full_skills and add a regression test for AgentContext(load_public_skills=True).model_dump(round_trip=True) preserving the full skill list.

all-hands-bot · 2026-05-20T07:44:04Z

+            # {a} would silently grow to {a, b} after `b` lands in the
+            # public marketplace, breaking the freeze-at-create guarantee
+            # ``_save_base_state`` relied on. See software-agent-sdk#3301.
+            context: dict[str, Any] = {"loading_from_snapshot": True}


🟠 Important: this loading_from_snapshot context is not enough to preserve the frozen snapshot, because the loaded state.agent is immediately replaced by the runtime agent below. That runtime agent is normally validated without this context; if the loader changed from {a} to {a,b} between save and resume, it already contains b, and autosave rewrites base_state.json with both skills. I reproduced this through ConversationState.create(..., persistence_dir=...): initial persisted skills ['a'], resumed/runtime skills ['a', 'b'], and the rewritten base state becomes ['a', 'b']. Please preserve/merge the loaded snapshot before autosaving the runtime agent (or adjust the stated freeze-at-create contract), and cover the real create→resume path rather than only calling _save_base_state directly.

…sume Two edge cases from the fourth bot review: 1. ``model_dump(round_trip=True)`` is Pydantic's canonical signal for "this dump must rehydrate without semantic loss". The trim ran under it too, so a caller using it for an ``AgentContext`` snapshot would silently reload whatever the loader returns now instead of the in-memory catalog they dumped. Fix: ``info.round_trip`` opts out of the trim, same as the explicit ``preserve_full_skills`` context flag. Pinned by ``test_round_trip_dump_preserves_full_skill_list``. 2. The persistence-resume guard (``loading_from_snapshot``) correctly populated ``state.agent.agent_context.skills`` from the snapshot — but the next line, ``state.agent = agent``, unconditionally overwrote the loaded agent with the runtime one. The runtime agent's ``agent_context`` carried a freshly- resolved (and potentially drifted) skill list from its own ``_load_auto_skills`` run. Autosave then rewrote ``base_state.json`` with the drifted set. The reviewer reproduced this end-to-end: initial persisted ``['a']`` → resumed runtime ``['a', 'b']`` → autosave persisted ``['a', 'b']``. Fix: before ``state.agent = agent``, ``model_copy`` the runtime agent's ``agent_context`` with the loaded snapshot's skills. The snapshot stays authoritative for skills; runtime-only fields (LLM client, tool wiring, etc.) come from the runtime agent as today. Pinned by ``test_conversationstate_resume_preserves_snapshot_skills`` — full create → save → resume cycle via ``ConversationState.create`` and ``LocalFileStore``, not just ``_save_base_state`` in isolation. 793/793 tests pass on ``tests/sdk/context/`` + ``tests/sdk/conversation/`` + ``tests/cross/test_agent_secrets_integration.py``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

all-hands-bot

⚠️ QA Report: PASS WITH ISSUES

Functional verification passes: the PR trims auto-loaded skill payloads from serialized API-style output while preserving runtime skills, explicit skills, round-trip snapshots, and persisted conversation snapshots; CI is not fully green.

Does this PR achieve its stated goal?

Yes. Exercising the SDK as a user would showed AgentContext(load_public_skills=True) still loads the full 40 public skills plus an explicit skill at runtime, but default serialization on the PR drops the payload from 263,464 bytes / 41 skills to 582 bytes / 1 explicit skill. ConversationState API-style serialization likewise dropped from 265,711 bytes / 41 skills to 2,829 bytes / 1 skill, while base_state.json persistence stayed full at 41 skills so resume snapshots are not lost.

Phase	Result
Environment Setup	✅ `make build` completed successfully; no tests, linters, formatters, or pre-commit hooks were run by QA.
CI Status	⚠️ GitHub checks currently show `pre-commit` and `unresolved-review-threads` failing, with several image/build jobs still pending; SDK/cross/workspace/tools/windows tests are green in CI.
Functional Verification	✅ Before/after SDK scripts confirmed the payload reduction, explicit-skill preservation, full `round_trip` dumps, full persistence snapshots, and old-conversation migration behavior.

Functional Verification

Test 1: AgentContext default serialization trims auto-loaded skills but preserves runtime behavior

Step 1 — Establish baseline without the fix:
Checked out origin/main and ran uv run python /tmp/qa_agent_context_payload.py, which constructs a real AgentContext(load_public_skills=True, load_user_skills=False, skills=[explicit]), calls model_dump(), model_dump(round_trip=True), and reloads the dumped payload:

{
  "default_payload_bytes": 263464,
  "default_serialized_has_explicit": true,
  "default_serialized_skill_count": 41,
  "restored_has_explicit": true,
  "restored_runtime_skill_count": 41,
  "round_trip_payload_bytes": 263464,
  "round_trip_serialized_skill_count": 41,
  "runtime_has_explicit": true,
  "runtime_skill_count": 41
}

This confirms the old behavior carried all 40 auto-loaded public skills plus the explicit skill in default serialized output.

Step 2 — Apply the PR's changes:
Checked out d81d7c05e257908256176fc6884a313945ac73a7.

Step 3 — Re-run with the fix in place:
Ran the same command:

{
  "default_payload_bytes": 582,
  "default_serialized_has_explicit": true,
  "default_serialized_skill_count": 1,
  "restored_has_explicit": true,
  "restored_runtime_skill_count": 41,
  "round_trip_payload_bytes": 263464,
  "round_trip_serialized_skill_count": 41,
  "runtime_has_explicit": true,
  "runtime_skill_count": 41
}

This shows the PR achieves the headline behavior: default wire serialization keeps only the explicit skill, runtime still has all 41 skills, reloading the trimmed payload re-resolves the auto skills, and round_trip=True keeps the full snapshot.

Test 2: ConversationState API-style serialization shrinks while persistence stays full

Step 1 — Establish baseline without the fix:
On origin/main, ran uv run python /tmp/qa_conversation_state_payload.py, which creates a real Agent, AgentContext, LocalWorkspace, and ConversationState with persistence enabled:

{
  "base_state_existed_after_create": true,
  "persisted_has_explicit": true,
  "persisted_payload_bytes": 258728,
  "persisted_skill_count": 41,
  "runtime_skill_count": 41,
  "state_default_payload_bytes": 265711,
  "state_default_serialized_skill_count": 41
}

This confirms old API-style state serialization included the full auto-loaded catalog.

Step 2 — Apply the PR's changes:
Checked out d81d7c05e257908256176fc6884a313945ac73a7.

Step 3 — Re-run with the fix in place:
Ran the same command:

{
  "base_state_existed_after_create": true,
  "persisted_has_explicit": true,
  "persisted_payload_bytes": 258728,
  "persisted_skill_count": 41,
  "runtime_skill_count": 41,
  "state_default_payload_bytes": 2829,
  "state_default_serialized_skill_count": 1
}

This confirms API-style ConversationState.model_dump() gets the payload win while persisted base_state.json still freezes the full skill snapshot.

Test 3: Existing full persisted conversations migrate to trimmed API output

Step 1 — Reproduce old persisted shape:
On origin/main, ran uv run python /tmp/qa_migration_create_base.py to create a persisted conversation with the old full serialized skill list:

{
  "created_with_base_main": true,
  "persisted_payload_bytes": 258728,
  "persisted_skill_count": 41,
  "runtime_skill_count": 41
}

This establishes the migration baseline: old conversations have all resolved skills in base_state.json.

Step 2 — Apply the PR's changes:
Checked out d81d7c05e257908256176fc6884a313945ac73a7.

Step 3 — Resume the old conversation with the fix in place:
Ran uv run python /tmp/qa_migration_resume_pr.py against the persisted conversation created on main:

{
  "api_style_has_explicit": true,
  "api_style_payload_bytes": 582,
  "api_style_serialized_skill_count": 1,
  "persisted_has_explicit_after_resume": true,
  "persisted_payload_bytes_after_resume": 258728,
  "persisted_skill_count_after_resume": 41,
  "resumed_runtime_skill_count": 41,
  "runtime_agent_initial_skill_count": 41
}

This confirms an old full persisted conversation resumes under the PR with the full runtime/persisted skill snapshot intact, while API-style output trims to explicit-only.

Issues Found

🟠 CI status: GitHub checks currently report pre-commit and unresolved-review-threads as failing, with some build jobs still pending. Functional QA did not investigate those CI failures because this run intentionally avoided tests, linters, formatters, and pre-commit.

This QA review was created by an AI agent (OpenHands) on behalf of the user.

Final verdict: PASS WITH ISSUES.

all-hands-bot

Taste Rating: 🟡 Acceptable — the payload reduction is useful, but a few serialization edge cases can still lose or misrepresent the frozen skill snapshot.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM — changes AgentContext serialization/rehydration and persisted resume semantics for skills.

VERDICT:
❌ Needs follow-up before approval.

This review was created by an AI agent (OpenHands) on behalf of the user.

Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.

Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.

When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.

Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26149123203

all-hands-bot · 2026-05-20T08:01:00Z

+            if loaded_agent_context is not None and agent.agent_context is not None:
+                agent = agent.model_copy(
+                    update={
+                        "agent_context": agent.agent_context.model_copy(


🟠 Important: this graft only replaces the public skills field; the copied context keeps the runtime context's private _auto_loaded_skills snapshot. If the persisted snapshot is {a} and the current loader now returns {a,b}, resume correctly sets ctx.skills == {a}, but default model_dump() emits skills: [] because it compares against the runtime {a,b} snapshot. A receiver revalidating that payload then auto-loads {a,b}, so the wire payload no longer represents the resumed conversation. Copy/reset the auto-load bookkeeping from the loaded context, or otherwise force these loaded skills to serialize, when preserving the snapshot.

all-hands-bot · 2026-05-20T08:01:01Z

+                    # freezes the auto-loaded set against marketplace
+                    # / user-skill churn between save and load.
+                    continue
                self.skills.append(skill)


🟠 Important: appending auto-loaded skills in the after-validator doesn't mark skills as set. As a result, AgentContext(load_public_skills=True).model_dump(round_trip=True, exclude_unset=True) (and the preserve_full_skills variant) omits skills entirely before the serializer can preserve the snapshot, so reloading can pick up a different current skill catalog. If round-trip/preserve dumps are meant to be lossless, mark skills as set when the validator mutates it or avoid relying on a field serializer for this path.

all-hands-bot · 2026-05-20T08:01:01Z

+        if self._auto_load_config != current_config:
+            return handler(value)
+        auto = self._auto_loaded_skills
+        kept = [s for s in value if auto.get(s.name) != s]


🟠 Important: filtering to kept before delegating to handler changes Pydantic's index-based nested include/exclude semantics for skills. Once auto-loaded entries are removed, the caller's indices are applied to the compacted list, so an include/exclude targeting an auto-loaded index can incorrectly serialize or drop an explicit skill. Skip trimming when nested include/exclude targets skills, or apply the trim after serialization while preserving the original index mapping.

simonrosenberg · 2026-05-20T09:12:46Z

Closing in favour of #3316, which solves the same problem at the route boundary instead of inside the AgentContext model.

Why

Both PRs deliver the same measured win (~260 KB → ~5 KB on the wire, ~42% faster cold-open for heavy conversations). But:

This PR grew to 5 commits / ~150 lines across agent_context.py + state.py, with multiple PrivateAttr snapshots, three serialization context flags (preserve_full_skills, loading_from_snapshot, round_trip), config-drift detection, deep copies, and a model_copy-merge dance in the ConversationState resume path. Each guard fixed a real edge case the bot review caught — but the underlying cause of all of them is the same: the model is trying to keep three different truths in sync (in-memory full, wire trimmed, persisted full).
perf(agent_server)!: trim conversation skills by default (include_skills=false) #3316 moves the trim to the FastAPI route boundary. The model stays a single source of truth (full skills in-memory, full skills on disk, full skills in the default API response). The trim becomes a 1-line opt-in on the routes that emit ConversationInfo, via a new ?include_skills=false query parameter — backward-compatible by default, so RemoteConversation.agent.agent_context.skills and every other external SDK consumer keeps round-tripping correctly. ~50 lines total across two files; none of the model-side guards are needed because the model never lies.

The work in this PR isn't lost — the test suite alone covers a lot of useful AgentContext edge cases that future model-side changes could lean on. The branch stays on the remote (perf/agent-context-skill-payload) if anyone wants to pick it up.

Tracking the simpler approach

agent-server: #3316 — adds the opt-in query param.
typescript-client: #172 (draft) — typed wrapper for the option.
agent-canvas: #651 (draft) — opts in.

Same end-to-end measured impact, much smaller surface area, fully backward-compatible.

github-actions Bot added the release-note-required PR requires explicit release-note coverage for behavioral or default changes label May 19, 2026

simonrosenberg self-assigned this May 19, 2026

all-hands-bot reviewed May 19, 2026

View reviewed changes

Comment thread openhands-sdk/openhands/sdk/context/agent_context.py Outdated

Comment thread openhands-sdk/openhands/sdk/context/agent_context.py Outdated

all-hands-bot reviewed May 19, 2026

View reviewed changes

simonrosenberg requested a review from all-hands-bot May 19, 2026 15:07

Merge branch 'main' into perf/agent-context-skill-payload

4837f97

all-hands-bot reviewed May 19, 2026

View reviewed changes

simonrosenberg requested a review from all-hands-bot May 19, 2026 15:17

all-hands-bot reviewed May 19, 2026

View reviewed changes

Debug Agent and others added 2 commits May 19, 2026 19:03

Merge branch 'main' into perf/agent-context-skill-payload

c55f95d

simonrosenberg requested a review from all-hands-bot May 20, 2026 07:35

all-hands-bot reviewed May 20, 2026

View reviewed changes

simonrosenberg requested a review from all-hands-bot May 20, 2026 07:51

all-hands-bot reviewed May 20, 2026

View reviewed changes

simonrosenberg mentioned this pull request May 20, 2026

perf(agent_server)!: trim conversation skills by default (include_skills=false) #3316

Merged

5 tasks

simonrosenberg closed this May 20, 2026

simonrosenberg mentioned this pull request May 20, 2026

perf: trim agent.agent_context.skills from /api/conversations responses by default (-98% wire bytes) #3301

Closed

Conversation

simonrosenberg commented May 19, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR changes

Migration path for existing conversations

Safety review across known consumers

Why

Measured impact

Test plan

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python API breakage checks — ✅ PASSED

Behavioral default changes detected

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

✅ QA Report: PASS

Does this PR achieve its stated goal?

Test 1: New serialized output drops auto-loaded skills but keeps runtime behavior

Test 2: Existing old serialized payloads shrink after loading on current PR head

Issues Found

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

✅ QA Report: PASS

Does this PR achieve its stated goal?

Test 1: Default AgentContext serialization trims auto-loaded public skills while runtime behavior stays intact

Test 2: Old full payloads migrate to trimmed default output on the PR

Issues Found

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

✅ QA Report: PASS

Does this PR achieve its stated goal?

Test 1: Default serialized payload drops auto-loaded public skills, but runtime and round-trip keep them

Test 2: Explicit/replaced/config-drift/persistence paths preserve skills

Issues Found

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

✅ QA Report: PASS

Does this PR achieve its stated goal?

Test 1: Normal serialization trims only auto-loaded skills, not runtime or explicit skills

Test 2: Full snapshots stay frozen when skill sources change

Issues Found

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simonrosenberg commented May 19, 2026 •

edited by github-actions Bot

Loading

github-actions Bot commented May 19, 2026 •

edited

Loading

github-actions Bot commented May 19, 2026 •

edited

Loading

github-actions Bot commented May 19, 2026 •

edited

Loading

Test 1: Default `AgentContext` serialization trims auto-loaded public skills while runtime behavior stays intact