Skip to content

Conversation

@corbt
Copy link
Contributor

@corbt corbt commented Jan 23, 2026

Summary

  • Add a separate TinkerNativeBackend that implements native Tinker loss/checkpoint flow with renderer-based data conversion and an in-process OpenAI-compatible server.
  • Define a new tinker dependency group (fastapi/uvicorn/tinker/tinker-cookbook) and conditionally export Tinker backends to keep base installs light.
  • Align LocalBackend model base_path with backend path and add an integration test for the Tinker native flow.

This is an alternative to #523. That PR tried to combine native Tinker behavior with ART-native behavior in a single backend, which made the code too complex to reason about. Here we split it into a separate native backend. Downside: args are less compatible with LocalBackend. Upside: we can take advantage of Tinker's functionality more explicitly.

Experimental: not yet tested with multi-turn rollouts or tool-calls, but the yes-no-maybe flow converges (avg reward ~0.955 by step 4).

Test plan

  • uv run pytest tests/integration/test_tinker_native_backend.py -v -s
  • Manual yes-no-maybe style loop (16 rollouts/prompt, converged by step 4).

Cursor Bot added 5 commits January 27, 2026 13:32
Separate native Tinker training/inference from LocalBackend to keep the API
clear while enabling explicit loss/checkpoint behavior and config.
Align tinker native types with OpenAI tooling and update tests to avoid
invalid type expressions under pyright.
Use merge_state for backend persistence to avoid clobbering model state, and
fail fast on trajectories without Choice objects to prevent no-op training.
Expose policy version fields on trajectories for off-policy tracking.
Add a new PipelineTrainer module that implements an asynchronous
3-stage pipeline (rollout, training, eval) for efficient RL training:

- PipelineTrainer: Main trainer class with configurable workers,
  batch sizes, and off-policy limits
- StatusReporter: Live progress reporting with tqdm and periodic
  logging
- PipelineState: Shared state dataclass for stage coordination
- Type definitions for RolloutFn, SingleRolloutFn, EvalFn

Key features:
- Async rollout workers with policy version tracking
- Stale sample detection and automatic discard
- Zero-variance group handling with collapse detection
- Graceful signal handling (SIGINT/SIGTERM)
- State persistence for training resumption
- Eval scheduling with configurable intervals

Also includes:
- yes_no_maybe_pipeline.py: Simple example showing basic usage
- binary_prefix_tool_pipeline.py: Complex example with tool calls

Updates to tinker_native backend:
- Add debug logging via ART_TINKER_TRAIN_LOG/ART_TINKER_SAMPLE_LOG
- Add fallback for create_conversation_prefix_with_tools
- Fix tool_call id handling in OpenAI server responses
- Fix import path for get_free_port (moved from service to server)
- Add cast for merge_state return type
- Fix test to use async function for TrajectoryGroup creation
- Move tinker deps to separate dependency group
- Add tinker to allowed-unresolved-imports for ty
@corbt corbt force-pushed the tinker-native-backend branch from 793b1e0 to 701e54a Compare January 27, 2026 13:35
@corbt corbt requested a review from bradhilton January 27, 2026 13:36
Copy link
Collaborator

@bradhilton bradhilton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@corbt corbt merged commit 7d8dc6d into main Jan 27, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants