Skip to content

fix(sdk): snapshot LLM metrics via Metrics.get_snapshot() (Q2 of #3341)#3397

Merged
malhotra5 merged 1 commit into
mainfrom
fix/reuse-metrics-get-snapshot
May 27, 2026
Merged

fix(sdk): snapshot LLM metrics via Metrics.get_snapshot() (Q2 of #3341)#3397
malhotra5 merged 1 commit into
mainfrom
fix/reuse-metrics-get-snapshot

Conversation

@VascoSch92
Copy link
Copy Markdown
Member

@VascoSch92 VascoSch92 commented May 27, 2026

What

Q2 of #3341. Every LLMResponse carries a MetricsSnapshot. It was hand-built and passed the live accumulated_token_usage by reference, so the "snapshot" aliased mutable LLM state instead of capturing it. This snapshots via the canonical Metrics.get_snapshot() directly in the two LLMResponse builders (_build_completion_result, _build_responses_result):

return LLMResponse(
    message=message,
    metrics=self.metrics.get_snapshot(),
    raw_response=resp,
)

Why this is good

  1. It makes the snapshot an actual snapshot. get_snapshot() deepcopys accumulated_token_usage; the hand-built version aliased the live object, so a returned snapshot could change after the fact. This is observable, not theoretical: ACPAgent mutates that object in place (acp_agent.py:736...accumulated_token_usage.model = self.acp_model), which would retroactively rewrite the model of any LLMResponse snapshot already handed out.

  2. Single source of truth. get_snapshot() is the canonical snapshot method and is already what ConversationStats uses (conversation_stats.py:51). The hand-built copy duplicated its field list and would silently drift — and re-introduce the aliasing — if MetricsSnapshot ever gains a field.

  3. No needless indirection. The interim _current_metrics_snapshot() wrapper added nothing once its body was a one-line delegation (no logic, no extra typing, only 2 call sites, unreferenced by tests), so it's dropped in favour of calling get_snapshot() directly. MetricsSnapshot is no longer imported in llm.py.

Why it's a small diff

The issue originally listed four hand-built MetricsSnapshot(...) sites (llm.py:907/1032/1256/1450). #3355 / #3356 already consolidated all four into one helper while fixing the async error path, so this PR finishes Q2 by removing that helper and snapshotting directly.

Testing

  • ruff format --check and ruff check clean on the changed file
  • 141 targeted tests pass: test_llm_metrics, test_llm_completion, test_llm_fallback, test_llm, test_stats_update_event_snapshot

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:9fe1960-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-9fe1960-python \
  ghcr.io/openhands/agent-server:9fe1960-python

All tags pushed for this build

ghcr.io/openhands/agent-server:9fe1960-golang-amd64
ghcr.io/openhands/agent-server:9fe1960a7c107190e4ed075cb0d440dc77ee5c29-golang-amd64
ghcr.io/openhands/agent-server:fix-reuse-metrics-get-snapshot-golang-amd64
ghcr.io/openhands/agent-server:9fe1960-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:9fe1960-golang-arm64
ghcr.io/openhands/agent-server:9fe1960a7c107190e4ed075cb0d440dc77ee5c29-golang-arm64
ghcr.io/openhands/agent-server:fix-reuse-metrics-get-snapshot-golang-arm64
ghcr.io/openhands/agent-server:9fe1960-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:9fe1960-java-amd64
ghcr.io/openhands/agent-server:9fe1960a7c107190e4ed075cb0d440dc77ee5c29-java-amd64
ghcr.io/openhands/agent-server:fix-reuse-metrics-get-snapshot-java-amd64
ghcr.io/openhands/agent-server:9fe1960-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:9fe1960-java-arm64
ghcr.io/openhands/agent-server:9fe1960a7c107190e4ed075cb0d440dc77ee5c29-java-arm64
ghcr.io/openhands/agent-server:fix-reuse-metrics-get-snapshot-java-arm64
ghcr.io/openhands/agent-server:9fe1960-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:9fe1960-python-amd64
ghcr.io/openhands/agent-server:9fe1960a7c107190e4ed075cb0d440dc77ee5c29-python-amd64
ghcr.io/openhands/agent-server:fix-reuse-metrics-get-snapshot-python-amd64
ghcr.io/openhands/agent-server:9fe1960-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:9fe1960-python-arm64
ghcr.io/openhands/agent-server:9fe1960a7c107190e4ed075cb0d440dc77ee5c29-python-arm64
ghcr.io/openhands/agent-server:fix-reuse-metrics-get-snapshot-python-arm64
ghcr.io/openhands/agent-server:9fe1960-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:9fe1960-golang
ghcr.io/openhands/agent-server:9fe1960a7c107190e4ed075cb0d440dc77ee5c29-golang
ghcr.io/openhands/agent-server:fix-reuse-metrics-get-snapshot-golang
ghcr.io/openhands/agent-server:9fe1960-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:9fe1960-java
ghcr.io/openhands/agent-server:9fe1960a7c107190e4ed075cb0d440dc77ee5c29-java
ghcr.io/openhands/agent-server:fix-reuse-metrics-get-snapshot-java
ghcr.io/openhands/agent-server:9fe1960-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:9fe1960-python
ghcr.io/openhands/agent-server:9fe1960a7c107190e4ed075cb0d440dc77ee5c29-python
ghcr.io/openhands/agent-server:fix-reuse-metrics-get-snapshot-python
ghcr.io/openhands/agent-server:9fe1960-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., 9fe1960-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 9fe1960-python-amd64) are also available if needed

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟢 Good taste

LGTM — reusing Metrics.get_snapshot() gives LLMResponse the canonical deep-copy snapshot semantics and removes duplicate field construction.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟢 LOW
    One-line helper refactor; no dependency, release, or eval-risk behavior changes.

VERDICT: ✅ Worth merging

KEY INSIGHT: Single source of truth for metrics snapshots fixes the ownership bug cleanly.

This review was generated by an AI agent (OpenHands) on behalf of the user.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26502376345

The per-response MetricsSnapshot was hand-built and passed the live accumulated_token_usage by reference, so it aliased mutable LLM state instead of capturing it. Call the canonical Metrics.get_snapshot() (which deep-copies accumulated_token_usage) directly in the two LLMResponse builders and drop the redundant _current_metrics_snapshot wrapper.
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified that LLMResponse metrics now behave as real snapshots instead of aliasing live LLM metrics.

Does this PR achieve its stated goal?

Yes. The PR set out to make the MetricsSnapshot attached to each LLMResponse independent from later mutations to LLM.metrics.accumulated_token_usage. On origin/main, a real LLM.completion(...) response shared the same token-usage object as the live metrics and changed after I mutated the live model name; on the PR branch, the response snapshot remained unchanged while the live metrics changed.

Phase Result
Environment Setup make build completed successfully.
CI Status 🟡 21 successful, 8 pending, 3 skipped, 0 failing at the time checked.
Functional Verification ✅ Base branch reproduced aliasing; PR branch preserved the response snapshot after live metric mutation.
Functional Verification

Test 1: Real SDK LLM.completion(...) response snapshot stays independent

Step 1 — Reproduce / establish baseline without the fix:
Checked out origin/main, then ran a short SDK script that creates an LLM, calls llm.completion(...) against the configured LLM proxy with a real prompt, and mutates llm.metrics.accumulated_token_usage.model after the response is returned.

Command shape:

git checkout --quiet origin/main
OPENHANDS_SUPPRESS_BANNER=1 uv run python - <<'SCRIPT'
import os
from openhands.sdk import LLM, Message, TextContent

llm = LLM(model=os.environ['LLM_MODEL'], api_key=os.environ['LLM_API_KEY'], base_url=os.environ['LLM_BASE_URL'], max_output_tokens=8)
response = llm.completion([Message(role='user', content=[TextContent(text='Reply with exactly: OK')])])
print('response usage exists:', response.metrics.accumulated_token_usage is not None)
print('snapshot and live share object:', response.metrics.accumulated_token_usage is llm.metrics.accumulated_token_usage)
print('response metric model before live mutation:', response.metrics.accumulated_token_usage.model)
llm.metrics.accumulated_token_usage.model = 'qa-mutated-live-model'
print('live metric model after mutation:', llm.metrics.accumulated_token_usage.model)
print('response metric model after live mutation:', response.metrics.accumulated_token_usage.model)
SCRIPT

Observed output excerpt:

response usage exists: True
snapshot and live share object: True
response metric model before live mutation: litellm_proxy/openai/gpt-5.5
live metric model after mutation: qa-mutated-live-model
response metric model after live mutation: qa-mutated-live-model

This confirms the bug existed: the already-returned LLMResponse.metrics changed retroactively when the live LLM metrics object was mutated.

Step 2 — Apply the PR's changes:
Checked out PR commit d8ac40d4ec285e74fd4cda983b5269af05daec4e / branch fix/reuse-metrics-get-snapshot.

Step 3 — Re-run with the fix in place:
Ran the same SDK LLM completion flow on the PR branch.

Observed output excerpt:

response usage exists: True
snapshot and live share object: False
response metric model before live mutation: litellm_proxy/openai/gpt-5.5
live metric model after mutation: qa-mutated-live-model
response metric model after live mutation: litellm_proxy/openai/gpt-5.5

This confirms the fix works for the user-facing SDK path: the returned LLMResponse.metrics retains its original model after the live metrics object is mutated.

Test 2: Direct snapshot helper parity with canonical snapshot behavior

I also exercised the snapshot behavior without an external LLM call by constructing an SDK LLM, assigning a TokenUsage, calling _current_metrics_snapshot(), then mutating the live token usage. On origin/main, the snapshot and live object were identical and the snapshot changed; on the PR branch, they were distinct and metrics.get_snapshot() was also distinct.

Baseline excerpt:

snapshot and live share object before live mutation: True
snapshot model after live mutation: mutated-model
canonical get_snapshot shares object: False

PR excerpt:

snapshot and live share object before live mutation: False
snapshot model after live mutation: original-model
canonical get_snapshot shares object: False

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

@VascoSch92 VascoSch92 force-pushed the fix/reuse-metrics-get-snapshot branch from d8ac40d to 9fe1960 Compare May 27, 2026 09:24
@VascoSch92 VascoSch92 changed the title fix(sdk): reuse Metrics.get_snapshot() for LLMResponse snapshot (Q2 of #3341) fix(sdk): snapshot LLM metrics via Metrics.get_snapshot() (Q2 of #3341) May 27, 2026
@VascoSch92 VascoSch92 requested a review from malhotra5 May 27, 2026 09:25
Copy link
Copy Markdown
Member

@malhotra5 malhotra5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thank you!

@malhotra5 malhotra5 merged commit 5df78e8 into main May 27, 2026
36 checks passed
@malhotra5 malhotra5 deleted the fix/reuse-metrics-get-snapshot branch May 27, 2026 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants