Skip to content

Intruder eval rework + autointerp provider rate limits#465

Open
ocg-goodfire wants to merge 6 commits intodevfrom
intruder-rework
Open

Intruder eval rework + autointerp provider rate limits#465
ocg-goodfire wants to merge 6 commits intodevfrom
intruder-rework

Conversation

@ocg-goodfire
Copy link
Copy Markdown
Collaborator

Summary

  • Intruder eval: streaming trial generation, lightweight density index, XML prompts, JSON responses with reasoning, prompt saving to DB, spd-intruder SLURM CLI
  • Autointerp/graph_interp: rate-limit config moved from global to per-provider

Split out from #463 (clustering-core) — independent changes.

Test plan

  • Run intruder eval on a harvest DB and verify results
  • basedpyright clean
  • ruff clean

🤖 Generated with Claude Code

ocg-goodfire and others added 2 commits March 30, 2026 08:47
- Format scripts/export_blog_data.py and scripts/export_component_data.py
- Suppress sklearn import errors in geometric_interaction/statistical_analysis.py
  (sklearn is not in the project dependencies)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Intruder eval improvements:
- Streaming trial generation (lazy iterator, not pre-built list)
- Lightweight DensityIndex (stores key+density, not full ComponentData)
- XML prompt format with raw + annotated views
- JSON response with reasoning field
- Save prompts to intruder_prompts DB table
- New spd-intruder SLURM CLI

Autointerp/graph_interp:
- Rate-limit config (max_concurrent, max_requests_per_minute) moved
  from global config to per-provider settings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ocg-goodfire and others added 4 commits March 31, 2026 18:30
Resolve conflicts in export scripts: take dev's canonical_to_concrete
key iteration (HEAD had .values() bug) and reasoning field additions.
Also fix unnecessary pyright ignore comments in statistical_analysis.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Existing DBs migrated in-place. intruder_prompts is already in _SCHEMA
for new DBs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ported from notebooks/2026-03-27-10-40_coherence_vs_density.py into
a proper CLI script with JSON config for specifying models/groups.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Strip coherence/violin plots, use ember for VPD and sandstone for others.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant