You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
yutori template: align with Yutori n1.5 recommendations and fix runtime/type issues (#166)
## Summary
PR #159 did the n1 → n1.5 model upgrade, but a few runtime bugs slipped
through and the template never fully aligned with the public Yutori
reference SDK. This PR fixes both, in both the TS and Python templates.
Implementation cross-checked against the public
[yutori-ai/yutori-sdk-python](https://github.com/yutori-ai/yutori-sdk-python)
reference (`yutori/navigator/*`).
### Runtime / correctness bugs
- **Key handling rewritten.** n1.5 emits **lowercase** key names
(`enter`, `up`, `pageup`) and supports **sequential presses** (`down
down down enter` — see [Key
Space](https://docs.yutori.com/reference/n1-5#key-space)). The previous
`KEY_MAP` was keyed on PascalCase Playwright-style names so it never
matched, and the handler sent the entire space-separated expression as a
single key. Replaced with a **lowercase → XKeysym** map (matching the
names Kernel's `press_key` accepts — `Return`, `Page_Up`, `Ctrl`, etc.,
per `cmd/browsers_test.go:1512`) and a `parseKeyExpression` /
`_parse_key_expression` that issues one `press_key` call per sequential
combo. Applies to `key_press`, `hold_key`, click `modifier`, and scroll
`modifier`.
- **Coordinates now clamped to [0, dim-1]** after denormalizing from the
n1.5 1000×1000 space. A boundary value of `1000` previously mapped to
pixel `1280` on a 1280×800 viewport — one pixel outside the valid range.
Matches the public SDK's `denormalize_coordinates` default clamp.
- **TS template now type-checks cleanly.** The old `tsconfig.json`
`extends`ed `../tsconfig.base.json` (which `kernel create` doesn't
copy), `index.ts:61` had a `ChatCompletionMessageParam[]` signature
mismatch, and `import sharp from 'sharp'` needed `esModuleInterop`.
Inlined the full tsconfig like every other TS template, fixed the
function signature, and added `esModuleInterop: true`. Verified with
`npx tsc --noEmit` on a fresh scaffold (previously produced 12 errors).
- **Python `pyproject.toml`** description was still `"n1 Computer Use"`
→ `"n1.5 Computer Use"`.
### Alignment with the public reference SDK
- **WebP quality 80 → 30** for screenshots. Kernel `capture_screenshot`
returns PNGs; the public SDK's `DEFAULT_WEBP_QUALITY_FOR_PNG = 30`
([images.py](https://github.com/yutori-ai/yutori-sdk-python/blob/main/yutori/navigator/images.py))
— lossless PNG sources tolerate aggressive WebP compression with no
visible degradation, and the payload savings matter on long multi-step
trajectories.
- **`formatTaskWithContext` / `_format_task_with_context`** appends
location, timezone, current date/time, and weekday to the initial task.
Mirrors
[`format_task_with_context`](https://github.com/yutori-ai/yutori-sdk-python/blob/main/yutori/navigator/context.py).
Threaded through new optional `user_timezone` / `user_location` payload
fields.
- **`formatStopAndSummarize` / `_format_stop_and_summarize`**: when the
loop hits `maxIterations` without a final answer, one extra screenshot +
summary call so callers get a usable result instead of empty content.
Mirrors the SDK's reference loop behavior.
- **Scroll amount scaling**: Kernel's `delta_y` is wheel-event repeat
count (not pixels), so the previous 1:1 forwarding produced very small
scrolls. Now multiplies by `SCROLL_NOTCHES_PER_AMOUNT = 4`, closer to
Yutori's documented "1 unit ≈ 10% of viewport height".
- **`maxIterations` 50 → 100** to match the public SDK example default.
### Docs / scaffold
- README headline example now matches the canonical "list team member
names from yutori.com" task from the public docs. The magnitasks Kanban
demo is kept below as an advanced example. README also documents the new
optional payload fields.
- `pkg/create/templates.go` `InvokeCommand` for both TS and Python uses
the same canonical example, so the `kernel create` next-steps hint
matches the README.
## Test plan
- [x] `make build && make test` pass; `go vet ./...` clean.
- [x] Scaffolded a fresh TS app from the updated template; `npx tsc
--noEmit` passes (previously failed: missing `tsconfig.base`,
type-mismatch in `index.ts:61`, sharp default import).
- [x] `kernel deploy` from the fresh scaffold deploys with **no**
TypeScript warnings (previously deploy logs included `Cannot read file
'/boot-node/tsconfig.base.json'` and the `ChatCompletionMessageParam[]`
error).
- [x] `kernel invoke` runs end-to-end against `https://www.yutori.com` /
"list team member names" — agent navigates to the team page and
identifies team members. (Note: yutori.com's parallax team UI is
genuinely tricky for any CUA; the iteration count is a UI artifact, not
a template regression. n1.5 emits `mouse_move` actions to trigger the
hover-reveal cards, which the new code handles.)
- [x] *Reviewer item*: ideally also test `key_press` Enter, sequential
presses (`down down enter`), `pageup`/`pagedown`, and shift-click
`modifier` to exercise the new key map paths end-to-end. I couldn't
reach a trajectory that emitted those during the smoke test.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Updates core Yutori agent loop/tool translation logic (keys,
coordinates, scrolling, iteration behavior), which can change automation
behavior and failure modes, but is limited to templates and not
security-sensitive code.
>
> **Overview**
> Improves the Yutori n1.5 computer-use templates (TS + Python) to
better match Yutori’s reference recommendations and fix several
runtime/correctness issues.
>
> The sampling loops now **add timezone/location/date context** to the
initial task, **raise `maxIterations` to 100**, and if the loop times
out without a final answer they perform a **stop-and-summarize**
follow-up call so invocations return a usable result.
>
> The computer tools were adjusted for n1.5 behavior:
**lowercase/sequential key expressions are parsed and mapped to Kernel
keysyms**, normalized coordinates are **clamped** when denormalized to
viewport pixels, scroll `amount` is **scaled** to Kernel wheel ticks,
screenshot WebP quality is reduced for smaller payloads, and `goto_url`
now normalizes missing schemes.
>
> Docs and scaffolding were updated: README examples and `kernel create`
invoke hints now use the `yutori.com` “team member names” task, new
optional payload fields (`user_timezone`, `user_location`) are
documented/exposed, Python metadata is corrected to n1.5, and the TS
template inlines a standalone `tsconfig.json` and fixes type issues
(including `esModuleInterop`).
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
24ae144. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Dhruv Batra <dbatra@Dhruvs-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kernel invoke python-yutori-cua cua-task --payload '{"query": "Navigate to https://www.yutori.com and list the team member names."}'
26
+
```
27
+
28
+
Optional payload fields:
29
+
30
+
-`record_replay` (bool) — capture a video of the session (paid plans only).
31
+
-`kiosk` (bool) — launch the browser without address bar / tabs ([see below](#kiosk-mode)).
32
+
-`user_timezone` (IANA, e.g. `"America/New_York"`) and `user_location` (free text, e.g. `"New York, NY, US"`) — appended to the task message so the model has accurate temporal/locational grounding.
33
+
34
+
More involved example (Kanban drag-and-drop):
35
+
24
36
```bash
25
37
kernel invoke python-yutori-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items."}'
0 commit comments