Skip to content

feat(sdk): add reasoning effort to the Core Agent#93

Merged
xTRam1 merged 22 commits into
mainfrom
lerdogan/sdk-core-agent-reasoning
Apr 30, 2026
Merged

feat(sdk): add reasoning effort to the Core Agent#93
xTRam1 merged 22 commits into
mainfrom
lerdogan/sdk-core-agent-reasoning

Conversation

@xTRam1

@xTRam1 xTRam1 commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Lets SDK callers opt into GPT-5.2's reasoning levels on the Core Agent — none / low / medium / high — mirroring the picker the web UI added on v2 of the agent step. The wire field is the same reasoningMode on NaradaGenerationRequest that the backend (caddie/cloud/src/ecs/ai/agents/core_agent.py:46-66) already honors only for CoreAgent; this PR just exposes it on the SDK surface.

⚠️ Stacked on top of #87. Base branch is lerdogan/python-agent-trace, not main. Once #87 merges, GitHub auto-rebases this onto main.

What changes

New enum in narada-core

# packages/narada-core/src/narada_core/models.py
class ReasoningEffort(StrEnum):
    """Amount of reasoning the Core Agent applies before responding."""
    NONE = "none"
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"

Re-exported from the narada and narada-pyodide package roots so callers write:

from narada import Agent, ReasoningEffort

agent() and dispatch_request() gain reasoning=

The new keyword serializes to body["reasoningMode"] only when set — absent on the wire when None, so older backends without the field keep working.

response = await window.agent(
    prompt="Prove sqrt(2) is irrational.",
    agent=Agent.CORE_AGENT,
    reasoning=ReasoningEffort.HIGH,
)

Type-level enforcement

The user asked for "a typing scheme that is enforced by typing that the user cannot pass in reasoning if the agent type is not CoreAgent". Two paired @overload signatures encode it:

  • One overload requires agent: Literal[Agent.CORE_AGENT] and accepts reasoning: ReasoningEffort | None = None.
  • The other accepts the wider agent: Agent | str = Agent.OPERATOR and has no reasoning parameter.

So this fails Pyright's overload match:

await window.agent(
    prompt="x",
    agent=Agent.OPERATOR,
    reasoning=ReasoningEffort.HIGH,  # Pyright: No overload matches this call
)

The runtime ValueError in dispatch_request() covers the path where users pass agent="..." as a string (overload narrowing doesn't reach into string-form), and any caller without a type checker:

raise ValueError(
    "`reasoning` is only supported with `agent=Agent.CORE_AGENT` "
    f"(got agent={agent!r})"
)

Internal forwarding pyright: ignores

The agent() impl forwards to dispatch_request(), but the impl signature is wider than any single overload (agent: Agent | str, reasoning: ReasoningEffort | None, output_schema: type[BaseModel] | None). Two narrow pyright: ignore[reportCallIssue] / [reportArgumentType] comments on the inner forward line — the public agent() overloads above already give callers correct return-type narrowing, so the bypass is on a single internal hop, not the user-facing API. Comments explain why inline.

Tests

packages/narada-pyodide/tests/test_reasoning.py — 8 new tests:

  • Body wiring (3): assert reasoningMode is present in the posted JSON when reasoning is set, absent when None, and that each of the four levels round-trips as the expected lowercase string.
  • Runtime validation (3): dispatch_request() and agent() both raise ValueError with a clear message when reasoning is paired with a non-CoreAgent enum, and the string-agent bypass path is covered too.
  • Enum (2): values match the backend Literal["none","low","medium","high"]; StrEnum JSON-encodes inline.

The pyodide package owns the runnable test harness in this repo today; the impl in the sibling narada package shares the same body wiring and runtime check, so coverage here exercises both paths.

Version bumps (coupled)

  • narada-core: 0.0.18 → 0.0.19
  • narada-pyodide: 0.0.45a2 → 0.0.46a1
  • narada: 0.1.43 → 0.1.44

The bumps cascade because all three packages co-import narada-core, and any caller pulling the new ReasoningEffort symbol needs the matching pin. Aligned with the bump pattern from #87.

Test plan

Local verification (done):

  • uv sync — workspace builds cleanly with the new versions
  • uv pip uninstall narada && uv run --package narada-pyodide pytest packages/narada-pyodide/tests/ — 67 passed (8 new + 59 existing); follows the pattern in packages/narada-pyodide/tests/README.md
  • uvx pyright --pythonpath .venv/bin/python --pythonversion 3.12 packages/narada/src/narada/window.py packages/narada-pyodide/src/narada/window.py — same error count as origin/lerdogan/python-agent-trace baseline (7, all pre-existing)
  • uv run ruff format --check packages/ — clean
  • uv run ruff check packages/ — clean
  • Smoke import check: from narada import Agent, ReasoningEffort and Agent.CORE_AGENT both resolve

Staging verification (pending):

  • Publish all three packages to PyPI (narada-core@0.0.19, narada-pyodide@0.0.46a1, narada@0.1.44) after merge
  • Run an end-to-end script from a Pyodide environment with reasoning=ReasoningEffort.HIGH against staging Core Agent; verify the observability dashboard shows the reasoning mode and OpenAI token counts include reasoning tokens

Example

from narada import Agent, Narada, ReasoningEffort

narada = Narada()
window = narada.open_local_window()

response = await window.agent(
    prompt="Solve this calculus problem step by step.",
    agent=Agent.CORE_AGENT,
    reasoning=ReasoningEffort.HIGH,
    timeout=120,
)
print(response.text)

This will hit the backend's CoreAgent.resolve_llm_options() path (caddie/cloud/src/ecs/ai/agents/core_agent.py:46-66), which builds an OpenAI Responses API call with reasoning={"effort": "high", "summary": "auto"}. The token-based credit pricing already includes reasoning tokens, so no usage-tracking changes were needed.

xTRam1 and others added 19 commits April 16, 2026 16:52
Add structured tracing for custom Python agents so their execution surfaces
on the Narada observability dashboard alongside GUI-built custom agents.

narada-core:
  - New PythonAgentRunTrace step type + PythonTraceEvent discriminated union
    covering stdout, stderr, sub-agent calls, extension actions, and side
    effects. Added to the ApaStepTrace union; parse_action_trace handles it
    transparently.

narada-pyodide:
  - New private _trace.py module with bounded-size summarisation of
    extension action requests/responses and per-event emitters
    (emit_sub_agent_call, emit_extension_action, emit_side_effect).
  - Instrument dispatch_request() to emit one subAgentCall event per
    invocation, covering success/error/timeout paths.
  - Instrument _run_extension_action() to emit one extensionAction event
    per call, with action_name keyed off the request discriminator.
  - Instrument download_file / render_html in utils.py to emit sideEffect
    events.
  - 38 unit tests exercise summarisation, truncation, emitter shapes, and
    Pydantic round-trip via parse_action_trace.

Version bumps (coupled to avoid parse_action_trace ValidationError for
external narada users whose traces may contain pythonAgentRun nodes):
  - narada-core:    0.0.17 -> 0.0.18
  - narada-pyodide: 0.0.43 -> 0.0.44
  - narada:         0.1.42 -> 0.1.43 (repin narada-core==0.0.18 only)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses the four ship-blocker findings from the cross-dimensional review:

Robustness — trace emission must not break user code (_trace.py):
  - `emit_trace_event` now wraps the serialise + forward in try/except and
    logs the failure instead of propagating it. Previously a stray non-
    serialisable value in a summary (a datetime, a Pydantic model leak)
    would raise TypeError out of `_run_extension_action` and abort the
    user's agent mid-run.
  - `json.dumps(event, default=str)` stringifies unknown types defensively.

Scalability — bound recursive trace size (_trace.py):
  - `emit_sub_agent_call` now strips the `events` list from any nested
    `pythonAgentRun` node in the forwarded action trace, replacing it with
    a `truncated_event_count` marker. Previously a custom Python agent
    that delegated to another custom Python agent embedded the sub-run's
    full event timeline in the parent's persisted JSON, producing
    O(breadth^depth) growth.

Robustness — code-quality cleanup (window.py):
  - Collapsed the duplicated `except asyncio.TimeoutError` / `except
    NaradaAgentTimeoutError_INTERNAL_DO_NOT_USE` blocks in
    `dispatch_request` into a single `except (A, B):` branch. Removes
    ~12 lines and the divergence risk.

Robustness — side-effect tracing on failure (utils.py):
  - `download_file` and `render_html` now emit a "failed" side-effect
    trace when the underlying JS call raises, then re-raise. Previously
    a failed download produced no trace at all — users saw silence
    rather than the actual error.

Type safety — schema invariants (narada-core/actions/models.py):
  - `PythonAgentRunTrace.duration_ms` and `truncated_event_count` now
    use `NonNegativeInt` — Pydantic rejects negative values at parse
    time rather than letting `-42ms` reach the dashboard formatter.
  - New `@model_validator` on `PythonSubAgentCallEvent` and
    `PythonExtensionActionEvent` rejects `ts_end < ts_start`; clock
    skew on the Pyodide clock can no longer produce negative-duration
    events that the renderer would display as `-5ms`.
  - `parse_action_trace` now dispatches deterministically based on the
    first item's discriminator (`step_type` vs `action`+`url`) rather
    than try/except-falling-through two adapters. Eliminates the risk
    of silently misrouting a homogeneity-violated trace.

Tests:
  - 13 new unit tests across `TestEmitDefensive`,
    `TestStripNestedPythonEvents`, `TestPythonEventInvariants`, and
    `TestParseActionTraceDispatch`. Full suite is now 51 tests, all
    passing under `uv run --package narada-pyodide pytest`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolve merge conflicts:
- pyproject.toml: keep both pytest and pytest-asyncio dev deps
- window.py: combine _get_auth_headers() refactor with trace instrumentation
- uv.lock: regenerated

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve conflict in window.py: combine _get_auth_headers() signature change
from #91 with trace instrumentation from this PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-trace

# Conflicts:
#	packages/narada-pyodide/src/narada/window.py
Lets SDK callers opt into GPT-5.2's reasoning levels on the Core Agent,
matching the picker the web UI added in v2 of the agent step. The wire
field stays the existing `reasoningMode: "none"|"low"|"medium"|"high"`
on `NaradaGenerationRequest`; only `CoreAgent` reads it server-side.

What changes:

- `narada-core` — new `ReasoningEffort` `StrEnum` (NONE/LOW/MEDIUM/HIGH).
  Re-exported from both `narada` and `narada-pyodide` package roots so
  callers can `from narada import ReasoningEffort`.

- `narada` and `narada-pyodide` — `dispatch_request()` and `agent()`
  gain a `reasoning: ReasoningEffort | None = None` parameter that
  serializes to `body["reasoningMode"]` only when set (absent on the
  wire when `None`, preserving backward-compat with older backends).

- Type-level enforcement that `reasoning` is only valid with
  `agent=Agent.CORE_AGENT`: paired `@overload` signatures use
  `Literal[Agent.CORE_AGENT]` to give Pyright/mypy a hard error on
  misuse. A runtime `ValueError` covers the string-form path
  (`agent="..."`) where overload narrowing doesn't help.

- 8 new unit tests in `narada-pyodide/tests/test_reasoning.py` cover
  body wiring (each effort level, omission when None), runtime
  validation (enum and string agent forms, both `agent()` and
  `dispatch_request()`), and enum-value alignment with the backend
  Literal.

Version bumps (coupled — see PR description for rationale):
- narada-core    0.0.18 → 0.0.19
- narada         0.1.43 → 0.1.44
- narada-pyodide 0.0.45a2 → 0.0.46a1
`narada_core.models` is not affected by the `_clear_modules()` reset
(only `narada` and `pyodide.*` get popped), so the per-test
`from narada_core.models import Agent, ReasoningEffort` repeats were
unnecessary. Move them up.
Removes _strip_nested_python_events. The function dropped events from any
nested pythonAgentRun node and stamped truncated_event_count on it, citing
"deep recursion blowing up persisted JSON size" as the reason.

In practice the policy was always-on and uniform — a 1-event nested trace
got stripped just as readily as a 10K-event one — and the frontend already
owns size enforcement via MAX_NESTED_ACTION_TRACE_BYTES in python.worker.ts
plus the workflow-run-detail consumer caps. Two layers of stripping is
strictly worse: small nested traces lose their events for no benefit, and
the dashboard's CollapsibleNestedTrace can't recover them (it does not
lazy-fetch by request_id).

Now: emit_sub_agent_call forwards action_trace_raw as-is. The frontend
caps when actually over budget. Tests updated to assert events flow
through unmodified.
…-trace

# Conflicts:
#	packages/narada-core/src/narada_core/actions/models.py
…o lerdogan/sdk-core-agent-reasoning

# Conflicts:
#	packages/narada-core/pyproject.toml
#	packages/narada-pyodide/pyproject.toml
#	packages/narada/pyproject.toml
#	packages/narada/src/narada/window.py
#	uv.lock
Base automatically changed from lerdogan/python-agent-trace to main April 30, 2026 19:11
Comment thread packages/narada-core/src/narada_core/models.py Outdated
Comment thread packages/narada-pyodide/src/narada/window.py
@abrahmasandra abrahmasandra self-requested a review April 30, 2026 20:03
@xTRam1 xTRam1 merged commit bf6dd7e into main Apr 30, 2026
4 checks passed
@xTRam1 xTRam1 deleted the lerdogan/sdk-core-agent-reasoning branch April 30, 2026 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants