Skip to content

feat: add Claude Code project-mode support#113

Open
GuyNachshon wants to merge 1 commit intohuggingface:mainfrom
GuyNachshon:claude-code-project-mode
Open

feat: add Claude Code project-mode support#113
GuyNachshon wants to merge 1 commit intohuggingface:mainfrom
GuyNachshon:claude-code-project-mode

Conversation

@GuyNachshon
Copy link
Copy Markdown

feat: add Claude Code project-mode support

Summary

Adds Claude Code as a second frontend alongside the standalone ml-intern CLI. Both share the same tools under agent/tools/ — no duplication, no changes to the agent runtime.

After this PR, running claude from the repo root works:

claude                                          # interactive
claude -p "fine-tune llama on my dataset"       # headless

What's new

File Purpose
packages/mcp_server/server.py MCP server that re-exposes agent/tools/* to Claude Code via stdio. Uses mcp.server.lowlevel.Server to preserve existing JSON schemas verbatim — FastMCP 3.x re-derives schemas from Python type hints, which would lose oneOf/operation-discriminated structures.
CLAUDE.md System prompt ported from agent/prompts/system_prompt_v3.yaml. plan_tool references → TodoWrite. research tool → Task subagent.
.claude/agents/research.md Research subagent (Claude Code's equivalent of the research_tool — its own context, restricted to read-only HF tools).
.claude/commands/{ml-intern,research,inspect-dataset,finetune,run-job}.md Five slash commands, opinionated to match CLAUDE.md methodology.
.claude/hooks/pre_tool_use_approval.py Port of _needs_approval from agent/core/agent_loop.py. Fail-safe: malformed payloads, non-dict tool_input, and empty tool_name all force a prompt instead of silent allow.
.claude/hooks/session_start_context.py Injects HF username (via huggingface_hub.HfApi.whoami()) and a local-mode banner so the model knows the user's HF namespace and whether to expect a sandbox. Mirrors agent/context_manager/manager.py.
.claude/hooks/session_end_upload.py Uploads transcripts to smolagents/ml-intern-sessions after running through agent/core/redact.py::scrub. Refuses paths outside ~/.claude/ or $CLAUDE_PROJECT_DIR.
CLAUDE_CODE_GUIDE.md User docs (slash commands, approvals, env knobs, troubleshooting).
.gitignore Switch from blanket .claude/ to specific exclusions (.claude/settings.local.json, __pycache__) so shared project config is tracked but per-user overrides are not.

What's not changed

  • agent/ is untouched. The standalone ml-intern CLI works exactly as before.
  • No new Python deps. The MCP server uses mcp and fastmcp already in pyproject.toml.
  • No CI changes (the vendoring-drift guard is in the follow-up plugin PR).

Behavior parity vs. standalone CLI

Config field Hook env var Default
yolo_mode ML_INTERN_YOLO 0
confirm_cpu_jobs ML_INTERN_CONFIRM_CPU_JOBS 1
save_sessions ML_INTERN_SAVE_SESSIONS 1
session_dataset_repo ML_INTERN_SESSION_REPO smolagents/ml-intern-sessions
--local ML_INTERN_LOCAL_MODE 0

What's intentionally not ported

  • Slash commands /help, /compact, /model, /yolo, /effort, /status, /undo — Claude Code has natives or these are model-loop concepts (/effort, /model probe) that don't apply.
  • record_llm_call, record_hf_job_*, record_sandbox_*, HeartbeatSaver — Claude Code's transcript persistence covers the externally-visible behavior (don't lose data on long turns).
  • auto_save_interval, heartbeat_interval_s, max_iterations, reasoning_effort, prompt_caching, effort_probe, model_switcher, hf_router_catalog — Claude Code provides equivalents.
  • private_hf_repo_tools — already disabled upstream (agent/core/tools.py:55-58).

Test plan

  • uv run python -m packages.mcp_server.server < /dev/null boots cleanly (15 tools register: 10 HF tools + 4 sandbox tools + sandbox_create).
  • Approval hook fail-safes on malformed stdin, empty tool_name, non-dict tool_input (all force ask).
  • Approval hook auto-approves CPU jobs with push_to_hub when ML_INTERN_CONFIRM_CPU_JOBS=0.
  • Approval hook always prompts for GPU jobs.
  • Approval hook surfaces from_pretrained without push_to_hub warning in the prompt reason.
  • SessionStart hook resolves HF username when HF_TOKEN is set; reports specific failure mode (no token vs whoami HTTP error vs other) when not.
  • SessionEnd hook redacts hf_…, sk-ant-…, sk-…, gh*_…, Bearer …, and KEY=value exports before upload (verified against planted secrets).
  • SessionEnd hook refuses paths outside ~/.claude/ or $CLAUDE_PROJECT_DIR.
  • Standalone ml-intern CLI continues to work unchanged.
  • Existing tests/unit/ (excluding pre-broken test_user_quotas.py which lacks pytest-asyncio) passes 46/46.

Anticipated review feedback

  • "Why low-level mcp.server.lowlevel.Server instead of FastMCP?" Documented in the docstring at top of server.py. FastMCP 3.x derives input schemas from Python type hints; the existing *_TOOL_SPEC["parameters"] JSON schemas in agent/tools/ use oneOf/operation-discriminated structures that don't round-trip through that.
  • "Why three hooks instead of one?" They fire on different lifecycle events (SessionStart / PreToolUse / SessionEnd). Merging would couple unrelated concerns.
  • "What about telemetry?" Claude Code writes the transcript continuously, so the externally-visible "don't lose trace data on long turns" behavior is covered. The internal Event channel was a Session-scoped abstraction without a Claude Code analog.
  • ".gitignore change might affect existing users." Only affects users who had local-only files in .claude/ (rare — that directory was previously fully ignored, so anything there was either accidental or intentional opt-out). The new rules track everything except settings.local.json (Claude Code's standard local-only override file) and runtime caches.

Follow-up

The plugin packaging (separate plugin/ directory, vendored library, marketplace manifest) is the next PR. This one stops at "the repo works as a Claude Code project."

Adds Claude Code as a second frontend alongside the standalone `ml-intern` CLI.
Both share the same tools under `agent/tools/` — no duplication, no changes to
agent runtime behavior.

What's new:

- `packages/mcp_server/server.py` — MCP server that re-exposes `agent/tools/*`
  to Claude Code via stdio. Uses `mcp.server.lowlevel.Server` to preserve the
  existing JSON schemas verbatim (FastMCP 3.x re-derives schemas from Python
  type hints, which would lose `oneOf`/operation-discriminated structures).
- `CLAUDE.md` — system prompt ported from `agent/prompts/system_prompt_v3.yaml`,
  with `plan_tool` → TodoWrite and the `research` tool → Task subagent
  substitutions noted.
- `.claude/agents/research.md` — research subagent (read-only HF tool subset).
- `.claude/commands/{ml-intern,research,inspect-dataset,finetune,run-job}.md`
- `.claude/hooks/`:
  - `pre_tool_use_approval.py` — port of `_needs_approval` from
    `agent/core/agent_loop.py`. Fail-safe on malformed input (forces a prompt
    rather than silently allowing).
  - `session_start_context.py` — injects HF username + local-mode banner.
  - `session_end_upload.py` — uploads transcripts to
    `smolagents/ml-intern-sessions` after running through `agent/core/redact.py`.
- `CLAUDE_CODE_GUIDE.md` — user docs.
- `.gitignore` — switch from blanket `.claude/` to specific exclusions
  (`.claude/settings.local.json`, `__pycache__`) so shared config is tracked
  but per-user overrides are not.

What's *not* changed:

- `agent/` is untouched. The standalone CLI still works exactly as before.
- No new Python deps. The MCP server uses `mcp` and `fastmcp` already in
  `pyproject.toml`.

Behavior parity vs. standalone CLI (env var → `Config` field):

  ML_INTERN_YOLO              → yolo_mode             (default 0)
  ML_INTERN_CONFIRM_CPU_JOBS  → confirm_cpu_jobs      (default 1)
  ML_INTERN_SAVE_SESSIONS     → save_sessions         (default 1)
  ML_INTERN_SESSION_REPO      → session_dataset_repo  (default smolagents/ml-intern-sessions)
  ML_INTERN_LOCAL_MODE        → --local               (default 0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant