fix: log runtime LLM metadata#24
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景
Runtime 已经通过结构化事件输出 LLM request/response metadata,用于排查工具型诊断中 skill、tool schema、tool choice 和 provider 返回之间的链路问题。为了后续线上定位时可以直接按 runId 检索 Runtime 日志,本 PR 在保持事件输出不变的基础上,同步输出一条安全的结构化日志。
改动范围
llm.request.metadata/llm.response.metadata事件时,同步向 stderr 写入 JSON structured log。查看方式
发版后,拿到异常 turn 对应的 Runtime runId 后,可以直接在 Hermes Runtime pod 日志中检索:
如果需要查看 JSON 字段,可以裁掉 Runtime Manager 日志前缀后再交给
jq:重点字段:
skills_count、skill_names、skill_prompt_presence、tools_count、tool_names、tool_choice、request_char_count、approx_input_tokens。finish_reason、assistant_tool_calls_count、assistant_tool_names、output_tokens。常见判断:
tools_count=0:工具 schema 没有真正下发,需要查 Runtime toolset/schema 链路。tools_count>0且finish_reason=stop、assistant_tool_calls_count=0:工具已下发,但模型没有选择 tool call。assistant_tool_calls_count>0但 Cloud 没有 tool action:再回查 Runtime event pump / Cloud projection。安全边界
不会记录以下内容:
验证
uv run pytest tests/runtime_manager/test_registry.py -k "llm_request_metadata or llm_response_metadata or llm_metadata_log"uv run pytest tests/runtime_manageruv run ruff check runtime_manager/worker_main.py tests/runtime_manager/test_registry.pypython3 -m py_compile runtime_manager/worker_main.pygit diff --checkgit diff --check origin/apecloud-base...HEAD非目标