Skip to content

perf(agent_context): drop auto-loaded skills from serialized output (-263 KB/conversation, -48% cold open)#3302

Closed
simonrosenberg wants to merge 9 commits into
mainfrom
perf/agent-context-skill-payload
Closed

perf(agent_context): drop auto-loaded skills from serialized output (-263 KB/conversation, -48% cold open)#3302
simonrosenberg wants to merge 9 commits into
mainfrom
perf/agent-context-skill-payload

Conversation

@simonrosenberg
Copy link
Copy Markdown
Collaborator

@simonrosenberg simonrosenberg commented May 19, 2026

Closes #3301. Full motivation, profile scripts, and measured impact live there.

What this PR changes

AgentContext._load_auto_skills keeps mutating self.skills exactly as today — runtime callers see the full resolved skill list. New: a PrivateAttr snapshots which skills came from the auto-load step, and a @field_serializer("skills") drops them from model_dump output using equality, not name match. Explicit caller-supplied skills are unaffected. Skills a consumer replaces in place via model_copy(update={'skills': merged}) are also unaffected — see below.

Safe because:

  • _load_auto_skills is a model_validator(mode="after"), so it re-runs on every deserialization and rebuilds the same auto-loaded subset from the persisted load_user_skills / load_public_skills / marketplace_path configuration.
  • A receiver loading the trimmed payload ends up with the exact same in-memory shape as the sender produced — verified by test_round_trip_via_serialized_output_re_resolves_auto_skills.
  • A consumer replacing an auto-loaded skill via model_copy survives serialization unchanged because the replacement's field values don't equal the snapshot. Verified against OpenHands' _create_agent_with_skills flow.

Migration path for existing conversations

Conversations stored under previous SDK versions have the resolved skill list inlined on agent_context.skills. On reload through the new SDK, the validator detects skills that equal what the loader produces and marks them as auto-loaded — so the next GET drops them from the wire even without conversation re-creation. A persisted skill that diverged (user edited the file, marketplace updated) keeps its on-disk content and stays on the wire. Tests test_migration_stored_skills_matching_loader_are_marked_auto_loaded + test_migration_stored_skills_diverged_from_loader_stay_explicit pin both halves.

Safety review across known consumers

Verified all callers that touch AgentContext.skills outside this PR:

  • openhands-sdk/workspace/remote/base.py:920AgentContext(skills=loaded_skills, load_public_skills=False). Auto-load disabled, serializer is a no-op. Unaffected.
  • openhands-sdk/workspace/remote/base.py:923AgentContext(skills=[], load_public_skills=True). The exact case this PR optimises. Unaffected functionally, smaller on the wire.
  • OpenHands/openhands/app_server/.../live_status_app_conversation_service.py:1394 — constructs AgentContext(system_message_suffix=..., secrets=secrets) with NEITHER load_*_skills flag. Auto-load short-circuits, _auto_loaded_skills stays empty, serializer is a no-op.
  • OpenHands/openhands/app_server/.../app_conversation_service_base.py:159-185 (_create_agent_with_skills) — merges sandbox + repo skills with agent_context.skills, then model_copy(update={'skills': merged}). If the merge replaces an auto-loaded skill with a different version under the same name, the equality-check serializer keeps the replacement. Covered by test_replacing_auto_skill_via_model_copy_survives_serialization.
  • agent-canvas createAgentFromSettings — seeds agent_context: {load_public_skills: true, load_user_skills: true} on every conversation create. Triggers the optimisation path.

grep confirms no other consumer in the SDK or in OpenHands/ sets load_user_skills / load_public_skills.

Why

Per-conversation GET was carrying the entire resolved skill catalog (~263 KB in stock setups). That bloat was paid by every consumer that does a conversation fetch — agent-canvas's cold-open profile attributes 1190 ms of perceived delay to the JSON.parse + structural-share + commit of that payload on the browser main thread. See #3301 for the empirical breakdown.

Measured impact

End-to-end against the live agent-server on 127.0.0.1:18000, same browser, same conversations, swapping only the SDK binary (uvx --with-editable on a worktree of this branch):

Metric Before (SDK 1.22.1) After (this PR) Delta
GET /api/conversations for 40-skill conv 258 KB 4.6 KB -98.2%
Cold-open "first event painted" (heavy) 2226 ms 1283 ms -943 ms (-42%)
Cold-open (0-skill convs) 979-1159 ms 1021-1277 ms run noise

Migration auto-fired on a conversation created under the old SDK — no re-creation needed.

Test plan

  • Six new cases in tests/sdk/context/test_agent_context_skills_serialization.py:
    • auto-loaded skills drop from model_dump
    • explicit skills survive model_dump
    • explicit-vs-auto name collision: explicit wins everywhere
    • dump → model_validate round-trip rebuilds the auto-loaded subset
    • disabled auto-load: explicit skills still serialize normally
    • concrete byte-count: 40-skill auto-load + 1 explicit skill serializes to <1 KB
  • Two migration cases:
    • stored skills matching the loader are recognised as auto-loaded
    • stored skills diverged from the loader stay treated as explicit
  • One regression case for OpenHands' flow:
    • test_replacing_auto_skill_via_model_copy_survives_serialization
  • pytest tests/sdk/context/ tests/cross/test_agent_secrets_integration.py296 passed (full regression for the changed module + adjacent integration).
  • ruff format + ruff check + pyright on the changed files — clean.
  • End-to-end empirical re-test against canvas (live browser probe) — 258 KB → 4.6 KB payload, 2226 ms → 1283 ms first-event-paint on the heavy conversation, exactly matching the prediction.

🤖 Generated with Claude Code


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:d81d7c0-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-d81d7c0-python \
  ghcr.io/openhands/agent-server:d81d7c0-python

All tags pushed for this build

ghcr.io/openhands/agent-server:d81d7c0-golang-amd64
ghcr.io/openhands/agent-server:d81d7c05e257908256176fc6884a313945ac73a7-golang-amd64
ghcr.io/openhands/agent-server:perf-agent-context-skill-payload-golang-amd64
ghcr.io/openhands/agent-server:d81d7c0-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:d81d7c0-golang-arm64
ghcr.io/openhands/agent-server:d81d7c05e257908256176fc6884a313945ac73a7-golang-arm64
ghcr.io/openhands/agent-server:perf-agent-context-skill-payload-golang-arm64
ghcr.io/openhands/agent-server:d81d7c0-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:d81d7c0-java-amd64
ghcr.io/openhands/agent-server:d81d7c05e257908256176fc6884a313945ac73a7-java-amd64
ghcr.io/openhands/agent-server:perf-agent-context-skill-payload-java-amd64
ghcr.io/openhands/agent-server:d81d7c0-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:d81d7c0-java-arm64
ghcr.io/openhands/agent-server:d81d7c05e257908256176fc6884a313945ac73a7-java-arm64
ghcr.io/openhands/agent-server:perf-agent-context-skill-payload-java-arm64
ghcr.io/openhands/agent-server:d81d7c0-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:d81d7c0-python-amd64
ghcr.io/openhands/agent-server:d81d7c05e257908256176fc6884a313945ac73a7-python-amd64
ghcr.io/openhands/agent-server:perf-agent-context-skill-payload-python-amd64
ghcr.io/openhands/agent-server:d81d7c0-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:d81d7c0-python-arm64
ghcr.io/openhands/agent-server:d81d7c05e257908256176fc6884a313945ac73a7-python-arm64
ghcr.io/openhands/agent-server:perf-agent-context-skill-payload-python-arm64
ghcr.io/openhands/agent-server:d81d7c0-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:d81d7c0-golang
ghcr.io/openhands/agent-server:d81d7c05e257908256176fc6884a313945ac73a7-golang
ghcr.io/openhands/agent-server:perf-agent-context-skill-payload-golang
ghcr.io/openhands/agent-server:d81d7c0-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:d81d7c0-java
ghcr.io/openhands/agent-server:d81d7c05e257908256176fc6884a313945ac73a7-java
ghcr.io/openhands/agent-server:perf-agent-context-skill-payload-java
ghcr.io/openhands/agent-server:d81d7c0-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:d81d7c0-python
ghcr.io/openhands/agent-server:d81d7c05e257908256176fc6884a313945ac73a7-python
ghcr.io/openhands/agent-server:perf-agent-context-skill-payload-python
ghcr.io/openhands/agent-server:d81d7c0-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., d81d7c0-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., d81d7c0-python-amd64) are also available if needed

When ``load_user_skills`` / ``load_public_skills`` are enabled,
``_load_auto_skills`` mutates ``self.skills`` at model-validation
time with the fully resolved skill catalog. Pydantic then persists
those skills onto the conversation's stored state and serializes
them back on every ``GET /api/conversations``. For a stock setup
that bundles ~40 skills under ``~/.openhands/skills``, that's
~263 KB of duplicated payload per stored conversation per fetch —
content the client never reads directly, because the skills are
only consumed server-side when rendering the system prompt.

Track which skill names ``_load_auto_skills`` added in a
``PrivateAttr`` and drop them at serialize time via a
``@field_serializer("skills")``. The validator is ``mode="after"``,
so it re-runs on every deserialization and repopulates the same
auto-loaded subset deterministically from the
``load_user_skills`` / ``load_public_skills`` / ``marketplace_path``
configuration. Runtime callers of ``self.skills`` (prompt rendering,
``to_acp_prompt_context``, ``get_dynamic_context``) see the full
resolved list as today — only the wire payload changes.

Measured impact (agent-canvas at ``/conversations/<id>``, cold-cold
full page reload, agent-server 1.22.1, same browser):

  Conversation              Before    After (this PR)
  40 skills + 161 events    2226 ms   ~1159 ms   (~48% faster)
  0 skills  + 209 events    1159 ms   1159 ms   (unchanged)
  0 skills  + 0 events       979 ms    979 ms   (unchanged)

Tests pin both halves of the contract:
- In-memory: every ``self.skills`` lookup keeps returning the full
  resolved list, including auto-loaded entries.
- On the wire: ``model_dump`` returns only explicit skills (caller-
  supplied via the constructor); auto-loaded ones are absent.
- Round-trip: dumping then ``model_validate``-ing repopulates the
  auto-loaded subset via the same validator, so a roundtripped
  ``AgentContext`` has the same in-memory shape as the source.
- An explicit skill colliding with an auto-loaded name wins (it was
  never marked auto-loaded), in-memory AND on the wire.
- A concrete byte-count assertion: 40-skill auto-load + 1 explicit
  skill serializes to <1 KB, not the ~40 KB the auto-loaded set
  would occupy.

Closes software-agent-sdk#3301.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the release-note-required PR requires explicit release-note coverage for behavioral or default changes label May 19, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Behavioral default changes detected

These public Field(default=...) changes were auto-marked with the release-note-required label:

  • openhands.sdk.llm.llm.LLM.model: 'claude-sonnet-4-20250514''gpt-5.5'

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

…d skills

Stored conversations created before PR #3302 carry the resolved auto-
loaded skill list inlined on ``agent_context.skills``. The validator's
"skip if name in existing_names" branch hit those entries without
marking them as auto-loaded, so the new serializer kept them on the
wire — the bloat would persist until each conversation was recreated.

Detect them: when an ``existing`` skill matches a loader-produced one
by equality (full Pydantic model compare), mark it auto-loaded too.
That makes the wire payload shrink on the very next ``GET`` after the
SDK is bumped, with no conversation re-creation needed.

A persisted skill that no longer matches the loader's current output
(user edited the file, marketplace updated) stays treated as explicit
— on-disk content wins. Covered by
``test_migration_stored_skills_diverged_from_loader_stay_explicit``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@simonrosenberg simonrosenberg self-assigned this May 19, 2026
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable, but there are two serialization-scope issues to address before I’d be comfortable approving.

Risk: 🟡 MEDIUM — this changes persisted AgentContext skill snapshots and nested serialization semantics, which can affect resumed conversations.

This review was created by an AI agent (OpenHands) on behalf of the user.

Comment thread openhands-sdk/openhands/sdk/context/agent_context.py Outdated
Comment thread openhands-sdk/openhands/sdk/context/agent_context.py Outdated
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Direct SDK execution confirms the PR removes auto-loaded public skills from serialized AgentContext payloads while preserving runtime skills, explicit skills, and round-trip behavior.

Does this PR achieve its stated goal?

Yes. On origin/main, constructing an AgentContext with 40 auto-loaded public skills plus one explicit skill serialized all 41 skills (model_dump_json_bytes 259897). On the PR head, the same user-facing SDK operation still had 41 runtime skills, but serialized only the explicit skill (model_dump_json_bytes 543), and AgentContext.model_validate(...) rehydrated the runtime skill list back to 41.

Phase Result
Environment Setup make build completed successfully and installed the uv environment.
CI Status ⏳ At last check: 12 success, 3 skipped, 8 in progress, 5 queued, no failures observed.
Functional Verification ✅ Base-to-PR comparison verified payload shrink, explicit preservation, round-trip rehydration, and old-payload migration on current PR head.
Functional Verification

Test 1: New serialized output drops auto-loaded skills but keeps runtime behavior

Step 1 — Establish baseline without the fix:
Ran an SDK script on origin/main (87ff9bd9) that constructed AgentContext(load_public_skills=True, load_user_skills=False, skills=[explicit_skill]) and printed runtime/dump metrics.

Output:

runtime_skill_count 41
dumped_skill_count 41
model_dump_json_bytes 259897

This confirms the baseline problem: the runtime loaded 40 public skills plus the explicit skill, and the full resolved skill catalog was serialized.

Step 2 — Apply the PR's changes:
Checked out current PR head 15002de8 (the PR advanced during QA; the originally supplied 5e1aca8 showed the same new-payload behavior).

Step 3 — Re-run with the fix in place:
Ran the same SDK construction, then called ctx.model_dump(), ctx.model_dump_json(), and AgentContext.model_validate(dumped).

Output:

runtime_skill_count 41
dumped_skill_count 1
dumped_skill_names ['qa-explicit-skill']
roundtrip_skill_count 41
roundtrip_has_public_skill True
model_dump_json_bytes 543

This shows the PR achieves the payload goal: runtime still sees all auto-loaded skills, serialized output keeps only the explicit skill, and deserializing the trimmed payload restores public skills.

Test 2: Existing old serialized payloads shrink after loading on current PR head

Step 1 — Establish old payload baseline:
On origin/main, I serialized the same context to /tmp/agent_context_old_payload.json.

Output:

old_payload_skill_count 41
old_payload_json_bytes 263469

This represents an existing conversation payload with auto-loaded skills already inlined.

Step 2 — Apply the PR's changes:
Checked out current PR head 15002de8.

Step 3 — Load and re-serialize the old payload:
Loaded /tmp/agent_context_old_payload.json with AgentContext.model_validate(payload) and re-serialized it with ctx.model_dump(mode='json').

Output:

payload_skill_count 41
loaded_runtime_skill_count 41
loaded_dumped_skill_count 1
loaded_dumped_skill_names ['qa-explicit-skill']
loaded_dumped_json_bytes 587

This confirms the current PR head also handles old inlined payloads: loading preserves runtime skills, and re-serializing drops the auto-loaded catalog.

Issues Found

None.

This QA review was created by an AI agent (OpenHands) on behalf of the user.

…s survive

OpenHands' ``_create_agent_with_skills`` (and any caller that does
``agent_context.model_copy(update={'skills': merged})``) replaces
auto-loaded skills in place when its merged list carries the same
name with different content. A name-only ``_auto_loaded_skill_names``
filter would silently drop those replacements at serialize time and
the receiver would auto-reload the stock version on the next
deserialization — losing the customisation.

Switch the PrivateAttr to a ``dict[str, Skill]`` snapshot. Serializer
drops a skill iff ``_auto_loaded_skills.get(name) == s`` — equality,
not name match. A replacement has different field values, fails the
equality check, and survives on the wire as today.

Verified safe across known consumers:
- ``openhands-sdk/workspace/remote/base.py:920`` — explicit skills with
  ``load_public_skills=False``; auto-load short-circuits, serializer
  is a no-op.
- ``openhands-sdk/workspace/remote/base.py:923`` — empty explicit with
  ``load_public_skills=True``; serializer drops the auto-loaded
  skills (the win we want).
- ``OpenHands/openhands/app_server/.../live_status_app_conversation_service.py:1394``
  — constructs AgentContext WITHOUT either ``load_*_skills`` flag.
  Auto-load is a no-op, ``_auto_loaded_skills`` stays empty.
- ``OpenHands/openhands/app_server/.../app_conversation_service_base.py:159``
  — ``_create_agent_with_skills`` does the model_copy + skill-replace
  flow. Covered by the new regression test.

New test:
``test_replacing_auto_skill_via_model_copy_survives_serialization``
— pins the exact OpenHands flow: auto-load a "git-skill", merge with
a different "git-skill" (same name, different content), model_copy,
serialize. Asserts the replacement survives on the wire instead of
being dropped as auto-loaded.

296/296 tests pass on the affected suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/context
   agent_context.py170696%489, 506–507, 575, 598, 604
openhands-sdk/openhands/sdk/conversation
   state.py210597%415, 463–465, 606
TOTAL26900777471% 

…ptions

Two issues from the bot review of PR #3302:

1. The serializer fired on EVERY ``model_dump``, including
   ``ConversationState._save_base_state`` (the persistence path).
   Without auto-loaded skills on the persisted snapshot, a resumed
   conversation would silently pick up *new* skill content from
   ``~/.openhands/skills`` / the public marketplace via
   ``_load_auto_skills`` — losing the freeze-at-create-time guarantee.

   Fix: opt-out via ``context={"preserve_full_skills": True}``.
   ``_save_base_state`` passes it; the API-response path doesn't —
   so persistence keeps the snapshot and the wire still gets the win.

2. The manual ``s.model_dump(mode=..., context=...)`` inside the
   serializer dropped the caller's ``exclude_none`` / ``exclude_defaults``
   / nested ``include`` options. ``AgentContext(skills=[...]).model_dump(
   exclude_none=True)`` would have leaked ``trigger: None``,
   ``source: None`` etc. — which also bloats the very payloads this
   PR optimises.

   Fix: switch to ``@field_serializer("skills", mode="wrap")`` and
   delegate the surviving list to ``handler``. Pydantic's default
   nested machinery then respects every caller option.

Tests:
- ``test_preserve_full_skills_context_flag_keeps_snapshot`` — flag
  reverts to full serialization for the snapshot path.
- ``test_wrap_serializer_propagates_exclude_none`` — pins the
  comment-2 regression with a concrete ``Skill.trigger`` canary.
- ``test_save_base_state_persists_full_skill_snapshot`` — end-to-end
  through ``ConversationState`` to ``InMemoryFileStore``; asserts the
  persisted JSON keeps all auto-loaded skill names.

299/299 tests pass on the affected suites.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable, but two serialization edge cases can lose caller-modified skills.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM — changes AgentContext serialization/rehydration semantics for persisted and remote conversation payloads.

VERDICT:
❌ Needs follow-up before approval.

This review was created by an AI agent (OpenHands) on behalf of the user.


Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

  1. Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.
  2. Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
  3. When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.

if info.context and info.context.get("preserve_full_skills"):
return handler(value)
auto = self._auto_loaded_skills
kept = [s for s in value if auto.get(s.name) != s]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: This filter still uses the copied private snapshot even if the auto-load config was changed after validation. For example, ctx.model_copy(update={"load_public_skills": False}) keeps the runtime auto-loaded skills but serializes them as []; on AgentContext.model_validate(...) they are not reloaded because the flag is now false. Changing marketplace_path has the same stale-snapshot problem, except the receiver may reload a different catalog. Please either tie the snapshot to the (load_user_skills, load_public_skills, marketplace_path) config and no-op when it no longer matches, or clear/recompute the snapshot when those fields are updated.

existing = existing_by_name.get(name)
if existing is None:
self.skills.append(skill)
self._auto_loaded_skills[name] = skill
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: This stores the same mutable Skill object that is appended to self.skills, so it is not actually a snapshot. Since Skill is mutable today, an in-place caller edit like ctx.skills[0].content = "custom" mutates the private copy too; equality still succeeds and model_dump() drops the customized skill instead of preserving it. Store an independent snapshot (deep copy or stable serialized fingerprint) and add a regression test for in-place modification.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

The SDK serialization behavior was exercised directly on main vs this PR, and the PR trims auto-loaded skills from default wire payloads while preserving runtime behavior and persisted snapshots.

Does this PR achieve its stated goal?

Yes. The goal was to drop auto-loaded skills from serialized AgentContext output without changing the in-memory resolved skills list or losing persistence safety. In direct SDK usage, main serialized 40 public skills into a 263,125-byte context, while this PR kept 40 runtime skills but serialized 0 auto-loaded skills by default in a 245-byte context; round-trip validation rebuilt the 40 skills, explicit/replacement skills stayed serialized, and ConversationState._save_base_state still persisted all 40 skills.

Phase Result
Environment Setup make build completed successfully with uv-managed dependencies.
CI Status ⚠️ At review time, Build & Push (python-amd64) and unresolved-review-threads were failing; several checks were still pending.
Functional Verification ✅ Direct SDK before/after probes verified payload shrink, runtime round-trip, explicit/replacement preservation, migration trimming, and full persistence snapshots.
Functional Verification

Test 1: Default AgentContext serialization trims auto-loaded public skills while runtime behavior stays intact

Step 1 — Establish baseline without the fix:
Checked out origin/main (42c49c83) and ran:

OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_agent_context_probe.py

Relevant output:

{
  "runtime_skill_count": 40,
  "default_dump_skill_count": 40,
  "default_context_bytes": 263125,
  "roundtrip_runtime_skill_count": 40,
  "persisted_skill_count": 40,
  "preserve_full_dump_skill_count": 40
}

This confirms the old behavior: the runtime correctly loaded 40 public skills, but the default serialized context also carried all 40 skills (~263 KB).

Step 2 — Apply the PR's changes:
Checked out PR commit 98f2c62375c692a565859579f9b2ec6d619e0554.

Step 3 — Re-run with the fix in place:
Ran the same command:

OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_agent_context_probe.py

Observed:

{
  "runtime_skill_count": 40,
  "default_dump_skill_count": 0,
  "default_context_bytes": 245,
  "roundtrip_runtime_skill_count": 40,
  "explicit_dump_skill_names": ["qa-explicit-skill"],
  "replacement_dump_skill_count": 1,
  "replacement_dump_first_content": "QA replacement body should survive serialization",
  "persisted_skill_count": 40,
  "preserve_full_dump_skill_count": 40,
  "preserve_full_context_bytes": 263125
}

This shows the PR delivers the key payload optimization: auto-loaded public skills stay available at runtime and after model_validate(...), but default serialization drops them. It also confirms explicit caller-supplied skills and same-name replacement skills remain on the wire, and persistence paths still preserve the full snapshot.

Test 2: Old full payloads migrate to trimmed default output on the PR

Step 1 — Establish baseline without the fix:
On origin/main, saved a real full AgentContext(load_public_skills=True) payload:

OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_save_main_payload.py

Output:

{
  "runtime_skill_count": 40,
  "saved_payload_bytes": 263125,
  "saved_payload_skill_count": 40
}

This represents the stored/pre-PR payload shape with all resolved public skills inlined.

Step 2 — Apply the PR's changes:
Checked out PR commit 98f2c62375c692a565859579f9b2ec6d619e0554 and loaded that main-branch payload through AgentContext.model_validate(...).

Step 3 — Re-run with the fix in place:
Ran:

OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_migration_probe.py

Observed:

{
  "old_payload_skill_count": 40,
  "old_payload_bytes": 263125,
  "runtime_skill_count_after_validate": 40,
  "default_dump_skill_count_after_validate": 0,
  "default_dump_bytes_after_validate": 245,
  "preserve_full_skill_count_after_validate": 40
}

This confirms the migration behavior claimed by the PR: old bloated payloads still load into the full in-memory skill list, but the next default serialization trims the auto-loaded skills while the explicit full-snapshot mode still retains them.

Issues Found

None from functional QA. CI still had failing/pending checks at the time of review, but the changed SDK behavior itself worked in the manual probes above.

This review was created by an AI agent (OpenHands) on behalf of the user.

…ntegrity

Two more edge cases from the bot review:

1. ``_auto_loaded_skills`` stored the same ``Skill`` reference that
   was appended to ``self.skills``. ``Skill`` is mutable, so an
   in-place edit (``ctx.skills[0].content = "custom"``) mutated the
   snapshot too; the equality check still succeeded and the
   customised skill silently disappeared from ``model_dump``.

   Fix: deep-copy the snapshot via ``skill.model_copy(deep=True)``.
   Pinned by ``test_in_place_skill_mutation_preserves_customisation_on_wire``.

2. The serializer used the snapshot even when the auto-load config
   changed after validation. ``ctx.model_copy(update={
   "load_public_skills": False})`` keeps the runtime auto-loaded
   skills (validators don't re-run on model_copy) but used to
   serialize them as ``[]``. On ``model_validate`` the receiver
   wouldn't auto-reload them (flag now False) — skills lost.
   Same shape for ``marketplace_path`` changes (receiver re-loads
   from a *different* catalog).

   Fix: snapshot ``(load_user_skills, load_public_skills,
   marketplace_path)`` alongside the skills. If the current config
   drifted, the serializer no-ops the trim and emits the full list,
   preserving the round-trip. Pinned by:
   - ``test_model_copy_toggling_load_flag_off_keeps_skills_on_wire``
   - ``test_model_copy_changing_marketplace_path_keeps_skills_on_wire``

302/302 tests pass on the affected suites.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

The SDK serialization change works as described: auto-loaded public skills stay available at runtime but are removed from the default serialized payload, while explicit/replaced/persisted skills remain preserved.

Does this PR achieve its stated goal?

Yes. On main, a real AgentContext(load_public_skills=True) serialized all 40 public skills into a 259,583-byte payload; on this PR, the same SDK usage serialized 0 auto-loaded skills into a 230-byte payload while ctx.skills and a dump→validate round trip still had all 40 skills. I also exercised explicit skills, replacement-via-model_copy, load-flag drift, and conversation base-state persistence, and those paths preserved skills as the PR promises.

Phase Result
Environment Setup make build completed and installed the uv development environment
CI Status ⚠️ At check time: 17 successful, 9 pending, 1 failing Review Thread Gate/unresolved-review-threads
Functional Verification ✅ Real SDK calls confirmed payload reduction and preservation behavior
Functional Verification

Test 1: Default serialized payload drops auto-loaded public skills, but runtime and round-trip keep them

I used a real SDK script that constructs AgentContext(load_public_skills=True), calls model_dump() / model_dump_json(), validates the dumped payload back into AgentContext, and also measures model_dump(context={"preserve_full_skills": True}).

Step 1 — Establish baseline on origin/main:
Ran git checkout --detach origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_agent_context_payload.py.

Relevant output:

{
  "runtime_skill_count": 40,
  "serialized_skill_count": 40,
  "serialized_payload_bytes": 259583,
  "roundtrip_skill_count": 40,
  "preserve_full_skill_count": 40,
  "preserve_full_payload_bytes": 259583
}

This confirms the old behavior: the resolved public skill catalog was duplicated into the default serialized output.

Step 2 — Apply the PR's changes:
Checked out perf/agent-context-skill-payload at 713148b1475053eea1fc7df7eed03533dd6e6893.

Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_agent_context_payload.py.

Relevant output:

{
  "runtime_skill_count": 40,
  "serialized_skill_count": 0,
  "serialized_payload_bytes": 230,
  "roundtrip_skill_count": 40,
  "preserve_full_skill_count": 40,
  "preserve_full_payload_bytes": 259583
}

This confirms the PR's headline goal: default wire serialization shrank dramatically, but runtime callers and deserialization still see the full resolved skill list. The persistence-style preserve_full_skills context still emits the complete snapshot.

Test 2: Explicit/replaced/config-drift/persistence paths preserve skills

Step 1 — Baseline expectation:
The PR explicitly claims only unmodified auto-loaded skills should be trimmed; explicit skills, replaced skills under the same name, load-flag-drifted copies, and conversation persistence snapshots should remain serialized.

Step 2 — Apply the PR's changes:
Used the PR branch from Test 1.

Step 3 — Exercise those SDK paths:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_agent_context_edges.py.

Relevant output:

{
  "explicit_runtime_skill_count": 41,
  "explicit_serialized_skill_names": ["qa-explicit"],
  "explicit_roundtrip_has_explicit": true,
  "replacement_target_name": "uv",
  "replacement_serialized_skills": [{"name": "uv", "content": "QA replacement body"}],
  "replacement_roundtrip_content": "QA replacement body",
  "load_flag_flipped_serialized_count": 40,
  "load_flag_flipped_roundtrip_count": 40,
  "persisted_base_state_skill_count": 40
}

This confirms the non-headline safety behavior: explicit caller skills survive default serialization, a consumer replacement under an auto-loaded name survives serialization and round-trip, config drift serializes the full list instead of losing it, and the actual conversation base-state save path persists the full 40-skill snapshot.

Issues Found

None.

This QA review was created by an AI agent (OpenHands) on behalf of the user.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable — good target, but two edge cases still break the explicit/preserved skill guarantees.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM — changes AgentContext serialization/rehydration semantics and can alter skill prompts after a round-trip.

VERDICT:
❌ Needs follow-up before approval.

This review was created by an AI agent (OpenHands) on behalf of the user.


Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

  1. Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.
  2. Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
  3. When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.

self.load_public_skills,
self.marketplace_path,
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: preserve_full_skills still doesn't freeze a saved snapshot on load. If the persisted payload contains auto skill a with load_public_skills=True, and the loader now returns a plus new b, this branch appends b during AgentContext.model_validate(...). The next persisted/full dump includes b, so the supposedly preserved snapshot can pick up newly published skills. The restore path needs a way to know a full snapshot is authoritative and skip appending missing auto skills (or disable auto-load for preserved snapshots).

if existing is None:
self.skills.append(skill)
# Deep-copy so an in-place caller mutation
# (``ctx.skills[0].content = "custom"``) doesn't also
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: This migration branch also catches caller-supplied explicit skills that happen to be equal to the current auto-loaded skill. In that case AgentContext(load_public_skills=True, skills=[explicit]) records the explicit skill in _auto_loaded_skills, model_dump() emits skills: [], and a later round-trip after the marketplace/user skill changes reloads the new content instead of the caller's pinned explicit skill. That contradicts the PR guarantee that explicit skills are unaffected. Please distinguish legacy persisted snapshots from direct caller-supplied skills, or keep equal existing skills explicit by default.

Debug Agent and others added 2 commits May 19, 2026 19:03
…pshot loads

Two more edge cases from the third bot review:

1. ``preserve_full_skills`` only prevented the trim on *save*. The
   load path still ran ``_load_auto_skills`` unconditionally and
   *appended* any newly-published skills the loader returned. A
   snapshot frozen as {a} would silently grow to {a, b} after `b`
   landed in the marketplace — and the next save would persist the
   drifted set.

   Fix: ``_load_auto_skills`` now accepts ``ValidationInfo``. When
   ``info.context["loading_from_snapshot"]`` is true (set by
   ``ConversationState`` factory on the resume path), the validator
   skips the append-new branch entirely — the persisted ``skills``
   list is treated as authoritative.

2. The migration branch (mark existing skills as auto-loaded when
   they equal the loader's output) also fired for fresh
   ``AgentContext(load_public_skills=True, skills=[explicit])``
   construction. A caller-pinned explicit skill that happened to
   equal the marketplace version would be snapshotted as auto-loaded,
   dropped from the wire, and silently swapped to a *new*
   marketplace version on the next round-trip.

   Fix: migration only fires under ``loading_from_snapshot``. Fresh
   construction keeps the "equal existing" branch as a no-op — the
   caller-supplied skill stays explicit on the wire.

The ``ConversationState`` resume path at ``state.py:354`` now passes
``context={"loading_from_snapshot": True}`` alongside the cipher.
API-response paths and direct caller construction don't pass the
flag, so they keep fresh-construction semantics — which is what we
want for the shrink-the-wire win.

New tests (5 → 8):
- ``test_fresh_construction_does_not_conflate_explicit_with_auto``
  — the comment-6 regression, fresh construction with a name
  collision.
- ``test_loading_from_snapshot_does_not_pick_up_new_auto_loaded_skills``
  — the comment-5 regression, snapshot {a} + loader returning
  {a, b} stays {a} in memory and on the wire.
- ``test_fresh_construction_still_appends_new_auto_skills`` — sanity
  that the default branch keeps growing for first-launch users.
- Existing migration tests updated to invoke the load path via
  ``model_validate(context={"loading_from_snapshot": True})``.

305/305 tests pass on the affected suites.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified the SDK behavior with real temporary user skills: the PR trims auto-loaded skills from normal serialization while preserving runtime skills, explicit skills, round-trip reloading, and full persisted snapshots.

Does this PR achieve its stated goal?

Yes. On base main, an AgentContext(load_user_skills=True) with 8 auto-loaded skills plus 1 explicit skill serialized all 9 skills (19,782 bytes); on this PR, the same real SDK usage kept 9 runtime skills but serialized only the explicit skill (534 bytes). I also verified the snapshot path: preserve_full_skills keeps the full skill list, and loading that snapshot after the skill source changes keeps the original content instead of picking up newly added/changed skills.

Phase Result
Environment Setup make build completed successfully with uv-managed dependencies.
CI Status ⚠️ Core checks observed passing; unresolved-review-threads was failing and several review/build-publish jobs were still pending/in progress at QA time.
Functional Verification ✅ Real SDK imports/calls exercised serialization, re-validation, and snapshot semantics.
Functional Verification

Test 1: Normal serialization trims only auto-loaded skills, not runtime or explicit skills

Step 1 — Establish baseline without the fix:
Checked out origin/main (64b36cf8) and ran a real SDK script that:

  • created 8 markdown skills under a temporary ~/.openhands/skills/,
  • constructed AgentContext(load_user_skills=True, skills=[qa-explicit]),
  • called model_dump(), model_dump(context={"preserve_full_skills": True}), and AgentContext.model_validate(...).

Observed:

{
  "commit": "64b36cf8",
  "runtime_skill_count": 9,
  "default_dump_skill_count": 9,
  "default_dump_skill_names": ["qa-explicit", "qa-auto-2", "qa-auto-1", "qa-auto-5", "qa-auto-4", "qa-auto-0", "qa-auto-6", "qa-auto-3", "qa-auto-7"],
  "default_dump_json_bytes": 19782,
  "preserve_full_dump_skill_count": 9,
  "roundtrip_runtime_skill_count": 9,
  "roundtrip_has_explicit": true
}

This confirms the pre-PR behavior carried the full auto-loaded skill payload in normal serialized output.

Step 2 — Apply the PR's changes:
Checked out perf/agent-context-skill-payload at c55f95db.

Step 3 — Re-run with the fix in place:
Ran the same SDK script and observed:

{
  "commit": "c55f95db",
  "runtime_skill_count": 9,
  "default_dump_skill_count": 1,
  "default_dump_skill_names": ["qa-explicit"],
  "default_dump_json_bytes": 534,
  "preserve_full_dump_skill_count": 9,
  "roundtrip_runtime_skill_count": 9,
  "roundtrip_has_explicit": true
}

This shows the headline behavior works: runtime still sees all 9 skills, normal serialization drops the 8 auto-loaded skills, the explicit caller-supplied skill remains on the wire, the preserved/full path still emits all skills, and re-validation restores the full runtime shape.

Test 2: Full snapshots stay frozen when skill sources change

On the PR branch, I created one real user skill, captured both normal API-style output and preserve_full_skills output, then changed that skill's file contents and added a new skill before re-validating.

Observed:

{
  "created_runtime_names": ["qa-stable"],
  "snapshot_skill_names": ["qa-stable"],
  "api_dump_skill_names": [],
  "loaded_snapshot_names": ["qa-stable"],
  "loaded_snapshot_contents": {
    "qa-stable": "# stable

original stable content"
  },
  "loaded_trimmed_names": ["qa-stable", "qa-new"]
}

This confirms the two intended modes: persisted/full snapshots keep the original skill content even after the source changes, while trimmed API-style payloads re-resolve the current auto-loaded skill set on load.

Issues Found

None.

This QA review was created by an AI agent (OpenHands) on behalf of the user.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable — the payload win is real, but two serialization/persistence edge cases still need attention.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM — changes AgentContext serialization and persisted resume semantics for skill catalogs.

VERDICT:
❌ Needs follow-up before approval.

This review was created by an AI agent (OpenHands) on behalf of the user.


Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

  1. Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.
  2. Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
  3. When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26148387350

nested ``include`` / ``exclude`` for the surviving skills —
manual ``s.model_dump(...)`` would have ignored them.
"""
if info.context and info.context.get("preserve_full_skills"):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: model_dump(round_trip=True) still goes through the lossy API-wire path here and emits skills: [] for auto-loaded skills. round_trip=True is Pydantic's standard signal for a dump that callers can persist/reload without semantic loss; with this serializer, a caller using it for an AgentContext snapshot will reload whatever the current external skill source returns instead of the in-memory catalog they dumped. Please treat info.round_trip like preserve_full_skills and add a regression test for AgentContext(load_public_skills=True).model_dump(round_trip=True) preserving the full skill list.

# {a} would silently grow to {a, b} after `b` lands in the
# public marketplace, breaking the freeze-at-create guarantee
# ``_save_base_state`` relied on. See software-agent-sdk#3301.
context: dict[str, Any] = {"loading_from_snapshot": True}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: this loading_from_snapshot context is not enough to preserve the frozen snapshot, because the loaded state.agent is immediately replaced by the runtime agent below. That runtime agent is normally validated without this context; if the loader changed from {a} to {a,b} between save and resume, it already contains b, and autosave rewrites base_state.json with both skills. I reproduced this through ConversationState.create(..., persistence_dir=...): initial persisted skills ['a'], resumed/runtime skills ['a', 'b'], and the rewritten base state becomes ['a', 'b']. Please preserve/merge the loaded snapshot before autosaving the runtime agent (or adjust the stated freeze-at-create contract), and cover the real create→resume path rather than only calling _save_base_state directly.

…sume

Two edge cases from the fourth bot review:

1. ``model_dump(round_trip=True)`` is Pydantic's canonical signal
   for "this dump must rehydrate without semantic loss". The trim
   ran under it too, so a caller using it for an ``AgentContext``
   snapshot would silently reload whatever the loader returns now
   instead of the in-memory catalog they dumped.

   Fix: ``info.round_trip`` opts out of the trim, same as the
   explicit ``preserve_full_skills`` context flag. Pinned by
   ``test_round_trip_dump_preserves_full_skill_list``.

2. The persistence-resume guard (``loading_from_snapshot``)
   correctly populated ``state.agent.agent_context.skills`` from
   the snapshot — but the next line, ``state.agent = agent``,
   unconditionally overwrote the loaded agent with the runtime
   one. The runtime agent's ``agent_context`` carried a freshly-
   resolved (and potentially drifted) skill list from its own
   ``_load_auto_skills`` run. Autosave then rewrote ``base_state.json``
   with the drifted set. The reviewer reproduced this end-to-end:
   initial persisted ``['a']`` → resumed runtime ``['a', 'b']`` →
   autosave persisted ``['a', 'b']``.

   Fix: before ``state.agent = agent``, ``model_copy`` the runtime
   agent's ``agent_context`` with the loaded snapshot's skills. The
   snapshot stays authoritative for skills; runtime-only fields
   (LLM client, tool wiring, etc.) come from the runtime agent as
   today. Pinned by ``test_conversationstate_resume_preserves_snapshot_skills``
   — full create → save → resume cycle via ``ConversationState.create``
   and ``LocalFileStore``, not just ``_save_base_state`` in isolation.

793/793 tests pass on ``tests/sdk/context/`` +
``tests/sdk/conversation/`` + ``tests/cross/test_agent_secrets_integration.py``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ QA Report: PASS WITH ISSUES

Functional verification passes: the PR trims auto-loaded skill payloads from serialized API-style output while preserving runtime skills, explicit skills, round-trip snapshots, and persisted conversation snapshots; CI is not fully green.

Does this PR achieve its stated goal?

Yes. Exercising the SDK as a user would showed AgentContext(load_public_skills=True) still loads the full 40 public skills plus an explicit skill at runtime, but default serialization on the PR drops the payload from 263,464 bytes / 41 skills to 582 bytes / 1 explicit skill. ConversationState API-style serialization likewise dropped from 265,711 bytes / 41 skills to 2,829 bytes / 1 skill, while base_state.json persistence stayed full at 41 skills so resume snapshots are not lost.

Phase Result
Environment Setup make build completed successfully; no tests, linters, formatters, or pre-commit hooks were run by QA.
CI Status ⚠️ GitHub checks currently show pre-commit and unresolved-review-threads failing, with several image/build jobs still pending; SDK/cross/workspace/tools/windows tests are green in CI.
Functional Verification ✅ Before/after SDK scripts confirmed the payload reduction, explicit-skill preservation, full round_trip dumps, full persistence snapshots, and old-conversation migration behavior.
Functional Verification

Test 1: AgentContext default serialization trims auto-loaded skills but preserves runtime behavior

Step 1 — Establish baseline without the fix:
Checked out origin/main and ran uv run python /tmp/qa_agent_context_payload.py, which constructs a real AgentContext(load_public_skills=True, load_user_skills=False, skills=[explicit]), calls model_dump(), model_dump(round_trip=True), and reloads the dumped payload:

{
  "default_payload_bytes": 263464,
  "default_serialized_has_explicit": true,
  "default_serialized_skill_count": 41,
  "restored_has_explicit": true,
  "restored_runtime_skill_count": 41,
  "round_trip_payload_bytes": 263464,
  "round_trip_serialized_skill_count": 41,
  "runtime_has_explicit": true,
  "runtime_skill_count": 41
}

This confirms the old behavior carried all 40 auto-loaded public skills plus the explicit skill in default serialized output.

Step 2 — Apply the PR's changes:
Checked out d81d7c05e257908256176fc6884a313945ac73a7.

Step 3 — Re-run with the fix in place:
Ran the same command:

{
  "default_payload_bytes": 582,
  "default_serialized_has_explicit": true,
  "default_serialized_skill_count": 1,
  "restored_has_explicit": true,
  "restored_runtime_skill_count": 41,
  "round_trip_payload_bytes": 263464,
  "round_trip_serialized_skill_count": 41,
  "runtime_has_explicit": true,
  "runtime_skill_count": 41
}

This shows the PR achieves the headline behavior: default wire serialization keeps only the explicit skill, runtime still has all 41 skills, reloading the trimmed payload re-resolves the auto skills, and round_trip=True keeps the full snapshot.

Test 2: ConversationState API-style serialization shrinks while persistence stays full

Step 1 — Establish baseline without the fix:
On origin/main, ran uv run python /tmp/qa_conversation_state_payload.py, which creates a real Agent, AgentContext, LocalWorkspace, and ConversationState with persistence enabled:

{
  "base_state_existed_after_create": true,
  "persisted_has_explicit": true,
  "persisted_payload_bytes": 258728,
  "persisted_skill_count": 41,
  "runtime_skill_count": 41,
  "state_default_payload_bytes": 265711,
  "state_default_serialized_skill_count": 41
}

This confirms old API-style state serialization included the full auto-loaded catalog.

Step 2 — Apply the PR's changes:
Checked out d81d7c05e257908256176fc6884a313945ac73a7.

Step 3 — Re-run with the fix in place:
Ran the same command:

{
  "base_state_existed_after_create": true,
  "persisted_has_explicit": true,
  "persisted_payload_bytes": 258728,
  "persisted_skill_count": 41,
  "runtime_skill_count": 41,
  "state_default_payload_bytes": 2829,
  "state_default_serialized_skill_count": 1
}

This confirms API-style ConversationState.model_dump() gets the payload win while persisted base_state.json still freezes the full skill snapshot.

Test 3: Existing full persisted conversations migrate to trimmed API output

Step 1 — Reproduce old persisted shape:
On origin/main, ran uv run python /tmp/qa_migration_create_base.py to create a persisted conversation with the old full serialized skill list:

{
  "created_with_base_main": true,
  "persisted_payload_bytes": 258728,
  "persisted_skill_count": 41,
  "runtime_skill_count": 41
}

This establishes the migration baseline: old conversations have all resolved skills in base_state.json.

Step 2 — Apply the PR's changes:
Checked out d81d7c05e257908256176fc6884a313945ac73a7.

Step 3 — Resume the old conversation with the fix in place:
Ran uv run python /tmp/qa_migration_resume_pr.py against the persisted conversation created on main:

{
  "api_style_has_explicit": true,
  "api_style_payload_bytes": 582,
  "api_style_serialized_skill_count": 1,
  "persisted_has_explicit_after_resume": true,
  "persisted_payload_bytes_after_resume": 258728,
  "persisted_skill_count_after_resume": 41,
  "resumed_runtime_skill_count": 41,
  "runtime_agent_initial_skill_count": 41
}

This confirms an old full persisted conversation resumes under the PR with the full runtime/persisted skill snapshot intact, while API-style output trims to explicit-only.

Issues Found

  • 🟠 CI status: GitHub checks currently report pre-commit and unresolved-review-threads as failing, with some build jobs still pending. Functional QA did not investigate those CI failures because this run intentionally avoided tests, linters, formatters, and pre-commit.

This QA review was created by an AI agent (OpenHands) on behalf of the user.

Final verdict: PASS WITH ISSUES.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable — the payload reduction is useful, but a few serialization edge cases can still lose or misrepresent the frozen skill snapshot.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM — changes AgentContext serialization/rehydration and persisted resume semantics for skills.

VERDICT:
❌ Needs follow-up before approval.

This review was created by an AI agent (OpenHands) on behalf of the user.


Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

  1. Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.
  2. Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
  3. When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26149123203

if loaded_agent_context is not None and agent.agent_context is not None:
agent = agent.model_copy(
update={
"agent_context": agent.agent_context.model_copy(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: this graft only replaces the public skills field; the copied context keeps the runtime context's private _auto_loaded_skills snapshot. If the persisted snapshot is {a} and the current loader now returns {a,b}, resume correctly sets ctx.skills == {a}, but default model_dump() emits skills: [] because it compares against the runtime {a,b} snapshot. A receiver revalidating that payload then auto-loads {a,b}, so the wire payload no longer represents the resumed conversation. Copy/reset the auto-load bookkeeping from the loaded context, or otherwise force these loaded skills to serialize, when preserving the snapshot.

# freezes the auto-loaded set against marketplace
# / user-skill churn between save and load.
continue
self.skills.append(skill)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: appending auto-loaded skills in the after-validator doesn't mark skills as set. As a result, AgentContext(load_public_skills=True).model_dump(round_trip=True, exclude_unset=True) (and the preserve_full_skills variant) omits skills entirely before the serializer can preserve the snapshot, so reloading can pick up a different current skill catalog. If round-trip/preserve dumps are meant to be lossless, mark skills as set when the validator mutates it or avoid relying on a field serializer for this path.

if self._auto_load_config != current_config:
return handler(value)
auto = self._auto_loaded_skills
kept = [s for s in value if auto.get(s.name) != s]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: filtering to kept before delegating to handler changes Pydantic's index-based nested include/exclude semantics for skills. Once auto-loaded entries are removed, the caller's indices are applied to the compacted list, so an include/exclude targeting an auto-loaded index can incorrectly serialize or drop an explicit skill. Skip trimming when nested include/exclude targets skills, or apply the trim after serialization while preserving the original index mapping.

@simonrosenberg
Copy link
Copy Markdown
Collaborator Author

Closing in favour of #3316, which solves the same problem at the route boundary instead of inside the AgentContext model.

Why

Both PRs deliver the same measured win (~260 KB → ~5 KB on the wire, ~42% faster cold-open for heavy conversations). But:

  • This PR grew to 5 commits / ~150 lines across agent_context.py + state.py, with multiple PrivateAttr snapshots, three serialization context flags (preserve_full_skills, loading_from_snapshot, round_trip), config-drift detection, deep copies, and a model_copy-merge dance in the ConversationState resume path. Each guard fixed a real edge case the bot review caught — but the underlying cause of all of them is the same: the model is trying to keep three different truths in sync (in-memory full, wire trimmed, persisted full).

  • perf(agent_server)!: trim conversation skills by default (include_skills=false) #3316 moves the trim to the FastAPI route boundary. The model stays a single source of truth (full skills in-memory, full skills on disk, full skills in the default API response). The trim becomes a 1-line opt-in on the routes that emit ConversationInfo, via a new ?include_skills=false query parameter — backward-compatible by default, so RemoteConversation.agent.agent_context.skills and every other external SDK consumer keeps round-tripping correctly. ~50 lines total across two files; none of the model-side guards are needed because the model never lies.

The work in this PR isn't lost — the test suite alone covers a lot of useful AgentContext edge cases that future model-side changes could lean on. The branch stays on the remote (perf/agent-context-skill-payload) if anyone wants to pick it up.

Tracking the simpler approach

  • agent-server: #3316 — adds the opt-in query param.
  • typescript-client: #172 (draft) — typed wrapper for the option.
  • agent-canvas: #651 (draft) — opts in.

Same end-to-end measured impact, much smaller surface area, fully backward-compatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note-required PR requires explicit release-note coverage for behavioral or default changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: trim agent.agent_context.skills from /api/conversations responses by default (-98% wire bytes)

2 participants