feat(sdk): add reasoning effort to the Core Agent#93
Merged
Conversation
Add structured tracing for custom Python agents so their execution surfaces
on the Narada observability dashboard alongside GUI-built custom agents.
narada-core:
- New PythonAgentRunTrace step type + PythonTraceEvent discriminated union
covering stdout, stderr, sub-agent calls, extension actions, and side
effects. Added to the ApaStepTrace union; parse_action_trace handles it
transparently.
narada-pyodide:
- New private _trace.py module with bounded-size summarisation of
extension action requests/responses and per-event emitters
(emit_sub_agent_call, emit_extension_action, emit_side_effect).
- Instrument dispatch_request() to emit one subAgentCall event per
invocation, covering success/error/timeout paths.
- Instrument _run_extension_action() to emit one extensionAction event
per call, with action_name keyed off the request discriminator.
- Instrument download_file / render_html in utils.py to emit sideEffect
events.
- 38 unit tests exercise summarisation, truncation, emitter shapes, and
Pydantic round-trip via parse_action_trace.
Version bumps (coupled to avoid parse_action_trace ValidationError for
external narada users whose traces may contain pythonAgentRun nodes):
- narada-core: 0.0.17 -> 0.0.18
- narada-pyodide: 0.0.43 -> 0.0.44
- narada: 0.1.42 -> 0.1.43 (repin narada-core==0.0.18 only)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses the four ship-blocker findings from the cross-dimensional review:
Robustness — trace emission must not break user code (_trace.py):
- `emit_trace_event` now wraps the serialise + forward in try/except and
logs the failure instead of propagating it. Previously a stray non-
serialisable value in a summary (a datetime, a Pydantic model leak)
would raise TypeError out of `_run_extension_action` and abort the
user's agent mid-run.
- `json.dumps(event, default=str)` stringifies unknown types defensively.
Scalability — bound recursive trace size (_trace.py):
- `emit_sub_agent_call` now strips the `events` list from any nested
`pythonAgentRun` node in the forwarded action trace, replacing it with
a `truncated_event_count` marker. Previously a custom Python agent
that delegated to another custom Python agent embedded the sub-run's
full event timeline in the parent's persisted JSON, producing
O(breadth^depth) growth.
Robustness — code-quality cleanup (window.py):
- Collapsed the duplicated `except asyncio.TimeoutError` / `except
NaradaAgentTimeoutError_INTERNAL_DO_NOT_USE` blocks in
`dispatch_request` into a single `except (A, B):` branch. Removes
~12 lines and the divergence risk.
Robustness — side-effect tracing on failure (utils.py):
- `download_file` and `render_html` now emit a "failed" side-effect
trace when the underlying JS call raises, then re-raise. Previously
a failed download produced no trace at all — users saw silence
rather than the actual error.
Type safety — schema invariants (narada-core/actions/models.py):
- `PythonAgentRunTrace.duration_ms` and `truncated_event_count` now
use `NonNegativeInt` — Pydantic rejects negative values at parse
time rather than letting `-42ms` reach the dashboard formatter.
- New `@model_validator` on `PythonSubAgentCallEvent` and
`PythonExtensionActionEvent` rejects `ts_end < ts_start`; clock
skew on the Pyodide clock can no longer produce negative-duration
events that the renderer would display as `-5ms`.
- `parse_action_trace` now dispatches deterministically based on the
first item's discriminator (`step_type` vs `action`+`url`) rather
than try/except-falling-through two adapters. Eliminates the risk
of silently misrouting a homogeneity-violated trace.
Tests:
- 13 new unit tests across `TestEmitDefensive`,
`TestStripNestedPythonEvents`, `TestPythonEventInvariants`, and
`TestParseActionTraceDispatch`. Full suite is now 51 tests, all
passing under `uv run --package narada-pyodide pytest`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolve merge conflicts: - pyproject.toml: keep both pytest and pytest-asyncio dev deps - window.py: combine _get_auth_headers() refactor with trace instrumentation - uv.lock: regenerated Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve conflict in window.py: combine _get_auth_headers() signature change from #91 with trace instrumentation from this PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-trace # Conflicts: # packages/narada-pyodide/src/narada/window.py
Lets SDK callers opt into GPT-5.2's reasoning levels on the Core Agent, matching the picker the web UI added in v2 of the agent step. The wire field stays the existing `reasoningMode: "none"|"low"|"medium"|"high"` on `NaradaGenerationRequest`; only `CoreAgent` reads it server-side. What changes: - `narada-core` — new `ReasoningEffort` `StrEnum` (NONE/LOW/MEDIUM/HIGH). Re-exported from both `narada` and `narada-pyodide` package roots so callers can `from narada import ReasoningEffort`. - `narada` and `narada-pyodide` — `dispatch_request()` and `agent()` gain a `reasoning: ReasoningEffort | None = None` parameter that serializes to `body["reasoningMode"]` only when set (absent on the wire when `None`, preserving backward-compat with older backends). - Type-level enforcement that `reasoning` is only valid with `agent=Agent.CORE_AGENT`: paired `@overload` signatures use `Literal[Agent.CORE_AGENT]` to give Pyright/mypy a hard error on misuse. A runtime `ValueError` covers the string-form path (`agent="..."`) where overload narrowing doesn't help. - 8 new unit tests in `narada-pyodide/tests/test_reasoning.py` cover body wiring (each effort level, omission when None), runtime validation (enum and string agent forms, both `agent()` and `dispatch_request()`), and enum-value alignment with the backend Literal. Version bumps (coupled — see PR description for rationale): - narada-core 0.0.18 → 0.0.19 - narada 0.1.43 → 0.1.44 - narada-pyodide 0.0.45a2 → 0.0.46a1
`narada_core.models` is not affected by the `_clear_modules()` reset (only `narada` and `pyodide.*` get popped), so the per-test `from narada_core.models import Agent, ReasoningEffort` repeats were unnecessary. Move them up.
Removes _strip_nested_python_events. The function dropped events from any nested pythonAgentRun node and stamped truncated_event_count on it, citing "deep recursion blowing up persisted JSON size" as the reason. In practice the policy was always-on and uniform — a 1-event nested trace got stripped just as readily as a 10K-event one — and the frontend already owns size enforcement via MAX_NESTED_ACTION_TRACE_BYTES in python.worker.ts plus the workflow-run-detail consumer caps. Two layers of stripping is strictly worse: small nested traces lose their events for no benefit, and the dashboard's CollapsibleNestedTrace can't recover them (it does not lazy-fetch by request_id). Now: emit_sub_agent_call forwards action_trace_raw as-is. The frontend caps when actually over budget. Tests updated to assert events flow through unmodified.
…o lerdogan/sdk-core-agent-reasoning
…o lerdogan/sdk-core-agent-reasoning
…-trace # Conflicts: # packages/narada-core/src/narada_core/actions/models.py
…o lerdogan/sdk-core-agent-reasoning # Conflicts: # packages/narada-core/pyproject.toml # packages/narada-pyodide/pyproject.toml # packages/narada/pyproject.toml # packages/narada/src/narada/window.py # uv.lock
…o lerdogan/sdk-core-agent-reasoning
…nt-reasoning # Conflicts: # packages/narada-pyodide/src/narada/window.py
abrahmasandra
approved these changes
Apr 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lets SDK callers opt into GPT-5.2's reasoning levels on the Core Agent —
none / low / medium / high— mirroring the picker the web UI added on v2 of the agent step. The wire field is the samereasoningModeonNaradaGenerationRequestthat the backend (caddie/cloud/src/ecs/ai/agents/core_agent.py:46-66) already honors only forCoreAgent; this PR just exposes it on the SDK surface.What changes
New enum in
narada-coreRe-exported from the
naradaandnarada-pyodidepackage roots so callers write:agent()anddispatch_request()gainreasoning=The new keyword serializes to
body["reasoningMode"]only when set — absent on the wire whenNone, so older backends without the field keep working.Type-level enforcement
The user asked for "a typing scheme that is enforced by typing that the user cannot pass in
reasoningif the agent type is notCoreAgent". Two paired@overloadsignatures encode it:agent: Literal[Agent.CORE_AGENT]and acceptsreasoning: ReasoningEffort | None = None.agent: Agent | str = Agent.OPERATORand has noreasoningparameter.So this fails Pyright's overload match:
The runtime
ValueErrorindispatch_request()covers the path where users passagent="..."as a string (overload narrowing doesn't reach into string-form), and any caller without a type checker:Internal forwarding
pyright: ignoresThe
agent()impl forwards todispatch_request(), but the impl signature is wider than any single overload (agent: Agent | str,reasoning: ReasoningEffort | None,output_schema: type[BaseModel] | None). Two narrowpyright: ignore[reportCallIssue]/[reportArgumentType]comments on the inner forward line — the publicagent()overloads above already give callers correct return-type narrowing, so the bypass is on a single internal hop, not the user-facing API. Comments explain why inline.Tests
packages/narada-pyodide/tests/test_reasoning.py— 8 new tests:reasoningModeis present in the posted JSON whenreasoningis set, absent whenNone, and that each of the four levels round-trips as the expected lowercase string.dispatch_request()andagent()both raiseValueErrorwith a clear message whenreasoningis paired with a non-CoreAgent enum, and the string-agent bypass path is covered too.Literal["none","low","medium","high"];StrEnumJSON-encodes inline.The pyodide package owns the runnable test harness in this repo today; the impl in the sibling
naradapackage shares the same body wiring and runtime check, so coverage here exercises both paths.Version bumps (coupled)
narada-core:0.0.18 → 0.0.19narada-pyodide:0.0.45a2 → 0.0.46a1narada:0.1.43 → 0.1.44The bumps cascade because all three packages co-import
narada-core, and any caller pulling the newReasoningEffortsymbol needs the matching pin. Aligned with the bump pattern from #87.Test plan
Local verification (done):
uv sync— workspace builds cleanly with the new versionsuv pip uninstall narada && uv run --package narada-pyodide pytest packages/narada-pyodide/tests/— 67 passed (8 new + 59 existing); follows the pattern inpackages/narada-pyodide/tests/README.mduvx pyright --pythonpath .venv/bin/python --pythonversion 3.12 packages/narada/src/narada/window.py packages/narada-pyodide/src/narada/window.py— same error count asorigin/lerdogan/python-agent-tracebaseline (7, all pre-existing)uv run ruff format --check packages/— cleanuv run ruff check packages/— cleanfrom narada import Agent, ReasoningEffortandAgent.CORE_AGENTboth resolveStaging verification (pending):
narada-core@0.0.19,narada-pyodide@0.0.46a1,narada@0.1.44) after mergereasoning=ReasoningEffort.HIGHagainst staging Core Agent; verify the observability dashboard shows the reasoning mode and OpenAI token counts include reasoning tokensExample
This will hit the backend's
CoreAgent.resolve_llm_options()path (caddie/cloud/src/ecs/ai/agents/core_agent.py:46-66), which builds an OpenAI Responses API call withreasoning={"effort": "high", "summary": "auto"}. The token-based credit pricing already includes reasoning tokens, so no usage-tracking changes were needed.