feat(tinker): add TinkerNativeBackend #532

corbt · 2026-01-23T07:58:15Z

Summary

Add a separate TinkerNativeBackend that implements native Tinker loss/checkpoint flow with renderer-based data conversion and an in-process OpenAI-compatible server.
Define a new tinker dependency group (fastapi/uvicorn/tinker/tinker-cookbook) and conditionally export Tinker backends to keep base installs light.
Align LocalBackend model base_path with backend path and add an integration test for the Tinker native flow.

This is an alternative to #523. That PR tried to combine native Tinker behavior with ART-native behavior in a single backend, which made the code too complex to reason about. Here we split it into a separate native backend. Downside: args are less compatible with LocalBackend. Upside: we can take advantage of Tinker's functionality more explicitly.

Experimental: not yet tested with multi-turn rollouts or tool-calls, but the yes-no-maybe flow converges (avg reward ~0.955 by step 4).

Test plan

uv run pytest tests/integration/test_tinker_native_backend.py -v -s
Manual yes-no-maybe style loop (16 rollouts/prompt, converged by step 4).

Separate native Tinker training/inference from LocalBackend to keep the API clear while enabling explicit loss/checkpoint behavior and config.

Align tinker native types with OpenAI tooling and update tests to avoid invalid type expressions under pyright.

Use merge_state for backend persistence to avoid clobbering model state, and fail fast on trajectories without Choice objects to prevent no-op training. Expose policy version fields on trajectories for off-policy tracking.

Add a new PipelineTrainer module that implements an asynchronous 3-stage pipeline (rollout, training, eval) for efficient RL training: - PipelineTrainer: Main trainer class with configurable workers, batch sizes, and off-policy limits - StatusReporter: Live progress reporting with tqdm and periodic logging - PipelineState: Shared state dataclass for stage coordination - Type definitions for RolloutFn, SingleRolloutFn, EvalFn Key features: - Async rollout workers with policy version tracking - Stale sample detection and automatic discard - Zero-variance group handling with collapse detection - Graceful signal handling (SIGINT/SIGTERM) - State persistence for training resumption - Eval scheduling with configurable intervals Also includes: - yes_no_maybe_pipeline.py: Simple example showing basic usage - binary_prefix_tool_pipeline.py: Complex example with tool calls Updates to tinker_native backend: - Add debug logging via ART_TINKER_TRAIN_LOG/ART_TINKER_SAMPLE_LOG - Add fallback for create_conversation_prefix_with_tools - Fix tool_call id handling in OpenAI server responses

- Fix import path for get_free_port (moved from service to server) - Add cast for merge_state return type - Fix test to use async function for TrajectoryGroup creation - Move tinker deps to separate dependency group - Add tinker to allowed-unresolved-imports for ty

bradhilton

LGTM!

Cursor Bot added 5 commits January 27, 2026 13:32

feat: add TinkerNativeBackend for native training

366d7ef

Separate native Tinker training/inference from LocalBackend to keep the API clear while enabling explicit loss/checkpoint behavior and config.

fix: address pre-commit type and format issues

386f351

Align tinker native types with OpenAI tooling and update tests to avoid invalid type expressions under pyright.

feat: add safer state merge and policy tracking

c6e1675

Use merge_state for backend persistence to avoid clobbering model state, and fail fast on trajectories without Choice objects to prevent no-op training. Expose policy version fields on trajectories for off-policy tracking.

corbt force-pushed the tinker-native-backend branch from 793b1e0 to 701e54a Compare January 27, 2026 13:35

corbt requested a review from bradhilton January 27, 2026 13:36

bradhilton approved these changes Jan 27, 2026

View reviewed changes

corbt merged commit 7d8dc6d into main Jan 27, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tinker): add TinkerNativeBackend #532

feat(tinker): add TinkerNativeBackend #532

corbt commented Jan 23, 2026

Uh oh!

bradhilton left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(tinker): add TinkerNativeBackend #532

feat(tinker): add TinkerNativeBackend #532

Conversation

corbt commented Jan 23, 2026

Summary

Test plan

Uh oh!

bradhilton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants