This document describes the public APIs that external applications rely on. These endpoints provide access to puzzle data, AI model analysis, user feedback, and performance metrics.
🔄 Recent Changes (Sept 2025): All artificial API result limits have been removed or significantly increased to support external applications.
Simple Python client for contributing analyses to ARC Explainer encyclopedia.
# Copy to your project
cp tools/api-client/arc_client.py your_project/from arc_client import contribute_to_arc_explainer
# One-line contribution to encyclopedia (no API key required)
result = contribute_to_arc_explainer(
"3a25b0d8", analysis_result, "grok-4-2025-10-13",
"https://arc-explainer-staging.up.railway.app"
)Features:
- One-line integration for any Python researcher
- Current October 2025 model names (no deprecated models)
- Uses existing
POST /api/puzzle/save-explained/:puzzleIdendpoint - Model-specific functions:
contribute_grok4_analysis(),contribute_gpt5_analysis() - Batch processing for multiple puzzles
- Zero external dependencies (only
requests)
Complete Documentation: tools/api-client/README.md
Most API endpoints are publicly accessible and require NO authentication. A small set of ARC3 community submission moderation endpoints is intentionally token-gated for safety (see "ARC3 Community" below).
-
GET /api/puzzle/list- Get paginated list of all puzzles- Query params:
page,limit,source(ARC1, ARC1-Eval, ARC2, ARC2-Eval) - Response: Paginated puzzle list with metadata
- Limits: No artificial limits - returns all puzzles by default
- Query params:
-
GET /api/puzzle/overview- Get puzzle statistics and overview- Response: Puzzle counts by source, difficulty distribution
- Limits: No limits
-
GET /api/puzzle/task/:taskId- Get specific puzzle data by ID- Params:
taskId(string) - Puzzle identifier - Response: Complete puzzle data with input/output grids
- Limits: Single puzzle fetch - no limits
- Params:
-
POST /api/puzzle/analyze/:taskId/:model- Analyze puzzle with specific AI model- Params:
taskId(string),model(string) - Model name - Body: Analysis configuration options (see Debate Mode below for debate-specific options). For conversation chaining via the Responses API, include
previousResponseIdto continue a prior analysis. - Response: Analysis result with explanation and predictions
- Limits: No limits
- Debate Mode: Include
originalExplanationandcustomChallengein body to generate debate rebuttals
- Params:
-
POST /api/stream/analyze- Prepare Server-Sent Events analysis stream- Body: Same analysis options accepted by the non-streaming POST endpoint (temperature, promptId, omitAnswer, reasoning options, etc.) plus
taskIdandmodelKey - Response:
{ sessionId, expiresInSeconds, expiresAt }referencing the cached payload stored on the server for the follow-up SSE request.expiresAtis an ISO timestamp representing the handshake expiration window. - Notes: Payloads are discarded automatically when the stream completes, errors, or is cancelled, and they auto-expire after 60 seconds if the SSE connection is never opened
- Body: Same analysis options accepted by the non-streaming POST endpoint (temperature, promptId, omitAnswer, reasoning options, etc.) plus
-
GET /api/stream/analyze/:taskId/:modelKey/:sessionId- Start Server-Sent Events stream for token-by-token analysis- Params:
taskId(string),modelKey(string),sessionId(string) returned from the POST handshake - Query: No longer accepts large option blobs; the server retrieves the cached payload prepared during the POST handshake
- Safety: If the
taskId/modelKeytuple does not match the cached payload, the server rejects the connection and clears the pending session to avoid leaks. - Response: SSE channel emitting
stream.init,stream.chunk,stream.status,stream.complete,stream.error. The initialstream.initpayload now includesexpiresAtso clients can display remaining handshake time. - Notes: Enabled when
STREAMING_ENABLED=true; defaults totruein development builds so SSE works out of the box. Currently implemented for GPT-5 mini/nano and Grok-4(-Fast) models. - Client: New
createAnalysisStreamutility inclient/src/lib/streaming/analysisStream.tsprovides a typed wrapper - Notes: Enabled when
ENABLE_SSE_STREAMING=true; currently implemented for GPT-5 mini/nano and Grok-4(-Fast) models - Client:
createAnalysisStreamutility inclient/src/lib/streaming/analysisStream.tsnow performs the POST handshake automatically before opening the SSE connection
- Params:
📘 Streaming configuration — Set
STREAMING_ENABLED=falseto disable SSE globally (frontend and backend). Leaving it unset keeps streaming enabled in development and requires explicit opt-out in production.
-
GET /api/puzzle/:puzzleId/has-explanation- Check if puzzle has existing explanation- Params:
puzzleId(string) - Response: Boolean indicating explanation existence
- Limits: No limits
- Params:
-
POST /api/puzzle/reinitialize- Reinitialize puzzle database- Admin endpoint: Reloads all ARC puzzle data
- Limits: No limits
GET /api/models- List all available AI models and providers- Limits: No limits
-
GET /api/model-dataset/datasets- Get all available ARC datasets dynamically- Response: Array of
DatasetInfoobjects withname,puzzleCount, andpath - Discovery: Automatically scans
data/directory for JSON puzzle files - Examples: evaluation (400 puzzles), training (400 puzzles), evaluation2 (117 puzzles), etc.
- Limits: No limits - returns all discovered datasets
- Response: Array of
-
GET /api/model-dataset/models- Get all models that have attempted puzzles- Response: Array of model names from database
explanationstable - Data Source: Distinct
model_namevalues with existing attempts - Limits: No limits - returns all models with database entries
- Response: Array of model names from database
-
GET /api/model-dataset/performance/:modelName/:datasetName- Get model performance on specific dataset- Params:
modelName(string),datasetName(string) - Any model and any dataset - Response:
ModelDatasetPerformancewith categorized puzzle results:solved[]: Puzzle IDs whereis_prediction_correct = true OR multi_test_all_correct = truefailed[]: Puzzle IDs attempted but incorrectnotAttempted[]: Puzzle IDs with no database entries for this modelsummary: Counts and success rate percentage
- Query Logic: Uses exact same logic as
puzzle-analysis.tsscript - Dynamic: Works with ANY model name and ANY dataset discovered from filesystem
- Limits: No limits
- Params:
DEPRECATED BATCH ENDPOINTS (never worked correctly):
POST /api/model/batch-analyze- Start batch analysis across multiple puzzlesGET /api/model/batch-status/:sessionId- Get batch analysis progressPOST /api/model/batch-control/:sessionId- Control batch analysis (pause/resume/stop)GET /api/model/batch-results/:sessionId- Get batch analysis resultsGET /api/model/batch-sessions- Get all batch analysis sessions
GET /api/puzzle/:puzzleId/explanations- Get all explanations for a puzzle- Query params:
correctness(optional) - Filter by 'correct', 'incorrect', or 'all' - Limits: No limits - returns all explanations
- Use case: ModelDebate page uses
?correctness=incorrectto show only wrong answers for debate
- Query params:
GET /api/puzzle/:puzzleId/explanation- Get single explanation for a puzzle- Limits: Single result - no limits
POST /api/puzzle/save-explained/:puzzleId- Save AI-generated explanation- Limits: No limits
POST /api/puzzle/analyze/:taskId/:model- Generate AI challenge to existing explanation- Debate Mode Body:
{ "originalExplanation": { "id": 123, "modelName": "gpt-4o", "patternDescription": "...", "solvingStrategy": "...", "hints": ["..."], "confidence": 85, "isPredictionCorrect": false }, "customChallenge": "Focus on edge cases in corners", "temperature": 0.2, "promptId": "debate" } - Response: New explanation with
rebuttingExplanationIdset to original explanation's ID - Use case: AI-vs-AI debate where one model critiques another's reasoning
- Database: Stores relationship in
rebutting_explanation_idcolumn
- Debate Mode Body:
-
GET /api/explanations/:id/chain- Get full rebuttal chain for an explanation- Params:
id(number) - Explanation ID to get debate chain for - Response: Array of
ExplanationDataobjects in chronological order (original → rebuttals) - Use case: Display complete debate thread showing which AIs challenged which
- Database: Uses recursive CTE query to walk rebuttal relationships
- Limits: No limits - returns entire chain regardless of depth
- Example Response:
{ "success": true, "data": [ { "id": 100, "modelName": "gpt-4o", "rebuttingExplanationId": null }, { "id": 101, "modelName": "claude-3.5-sonnet", "rebuttingExplanationId": 100 }, { "id": 102, "modelName": "gemini-2.5-pro", "rebuttingExplanationId": 101 } ] }
- Params:
-
GET /api/explanations/:id/original- Get parent explanation that a rebuttal is challenging- Params:
id(number) - Rebuttal explanation ID - Response: Single
ExplanationDataobject or 404 if not a rebuttal - Use case: Navigate from challenge back to original explanation
- Database: Joins on
rebutting_explanation_idforeign key - Returns 404: If explanation is not a rebuttal or parent doesn't exist
- Params:
Multi-turn conversations with full context retention using provider-native conversation chaining.
- Each AI analysis returns a
providerResponseIdin the response - Pass
previousResponseIdin the next analysis request to maintain context - Provider automatically retrieves ALL previous reasoning and responses (server-side)
- No token cost for accessing previous reasoning (30-day retention)
- OpenAI: o-series models (o3, o4, o4-mini) and GPT-5
- xAI: Grok-4 models
- Provider Compatibility: Response IDs only work within the same provider
- OpenAI ID → OpenAI models ✅
- xAI ID → xAI models ✅
- Cross-provider chaining ❌ (will start new conversation)
// Request 1: Initial analysis
POST /api/puzzle/analyze/00d62c1b/openai%2Fo4-mini
Body: { "promptId": "solver" }
Response: { "providerResponseId": "resp_abc123", ... }
// Request 2: Follow-up with full context
POST /api/puzzle/analyze/00d62c1b/openai%2Fo4-mini
Body: {
"promptId": "solver",
"previousResponseId": "resp_abc123" // Maintains context
}
Response: { "providerResponseId": "resp_def456", ... }- Column:
provider_response_id(text) inexplanationstable - Frontend Field:
providerResponseIdinExplanationDatatype - Mapping: Automatically handled by
useExplanationhook
GET /api/discussion/eligible- Get recent explanations eligible for conversation chaining- Query params:
limit(default 20),offset(default 0) - Eligibility Criteria:
- Has
provider_response_idin database - Created within last 30 days (provider retention window)
- NO model type restrictions - any model with response ID is eligible
- Has
- Response: Array of eligible explanations with metadata
{ "explanations": [ { "id": 29432, "puzzleId": "e8dc4411", "modelName": "openai/o4-mini", "provider": "openai", "createdAt": "2025-10-06T12:00:00Z", "daysOld": 3, "hasProviderResponseId": true, "confidence": 85, "isCorrect": true } ], "total": 1, "limit": 20, "offset": 0 } - Use case: PuzzleDiscussion landing page shows recent eligible analyses
- Limits: Server-side pagination with configurable limit
- Query params:
docs/API_Conversation_Chaining.md- Complete usage guidedocs/Responses_API_Chain_Storage_Analysis.md- Technical implementation details
POST /api/feedback- Submit user feedback on explanations- Limits: No limits
GET /api/explanation/:explanationId/feedback- Get feedback for specific explanation- Limits: No limits
GET /api/puzzle/:puzzleId/feedback- Get all feedback for a puzzle- Limits: No limits
GET /api/feedback- Get all feedback with optional filtering- Query params:
limit(max 10000, increased from 1000),offset, filters - Limits: Maximum 10000 results per request (previously 1000)
- Query params:
GET /api/feedback/stats- Get feedback summary statistics- Limits: No limits
🚨 CRITICAL CHANGE (Sept 30, 2025): Solver accuracy and debate accuracy are now tracked separately to prevent data pollution.
-
GET /api/feedback/accuracy-stats- Primary solver accuracy endpoint - Pure 1-shot puzzle-solving accuracy- Response:
PureAccuracyStatswithmodelAccuracyRankings[](used by AccuracyLeaderboard) - Sort Order: Ascending by accuracy (worst performers first - "Models Needing Improvement")
- Data Source:
is_prediction_correctandmulti_test_all_correctboolean fields only - Filtering:
WHERE rebutting_explanation_id IS NULL- EXCLUDES debate rebuttals - Use Case: Fair apples-to-apples model comparison for pure puzzle solving (no contextual advantage)
- 🔄 CHANGED: No longer limited to 10 results - returns ALL models with stats
- Response:
-
GET /api/feedback/debate-accuracy-stats- Debate challenger accuracy - Success rate for AI challenges/rebuttals- Response:
PureAccuracyStatswithmodelAccuracyRankings[](same structure as solver accuracy) - Sort Order: Descending by accuracy (best performers first - "Top Debate Challengers")
- Data Source:
is_prediction_correctandmulti_test_all_correctboolean fields only - Filtering:
WHERE rebutting_explanation_id IS NOT NULL- ONLY debate rebuttals - Use Case: Identify which models excel at challenging/critiquing incorrect explanations
- Research Value: Compare solver vs. critique capabilities across models
- Response:
GET /api/puzzle/performance-stats- Trustworthiness and confidence reliability metrics- Response:
PerformanceLeaderboardswithtrustworthinessLeaders[],speedLeaders[],efficiencyLeaders[] - Data Source:
trustworthiness_scorefield (AI confidence reliability) - 🔄 CHANGED: No longer limited to 10 results - returns ALL models with stats
- Response:
GET /api/puzzle/accuracy-stats- DEPRECATED - Mixed accuracy/trustworthiness data- Warning: Despite name, contains trustworthiness-filtered results
GET /api/puzzle/general-stats- General model statistics (mixed data from MetricsRepository)GET /api/puzzle/raw-stats- Infrastructure and database performance metricsGET /api/metrics/comprehensive-dashboard- Combined analytics dashboard from all repositories
🚨 CRITICAL: Cost calculations completely refactored for proper domain separation. All cost endpoints now use dedicated CostRepository following SRP principles.
-
GET /api/metrics/costs/models- Get cost summaries for all models- Response: Array of
ModelCostSummaryobjects with normalized model names - Data: Total cost, average cost, attempts, min/max costs per model
- Business Rules: Uses consistent model name normalization (removes :free, :beta, :alpha suffixes)
- Limits: No limits - returns all models with cost data
- Response: Array of
-
GET /api/metrics/costs/models/:modelName- Get detailed cost summary for specific model- Params:
modelName(normalized automatically - "claude-3.5-sonnet" matches "claude-3.5-sonnet:beta") - Response: Single
ModelCostSummaryobject - Limits: Single model result
- Params:
-
GET /api/metrics/costs/models/:modelName/trends?days=30- Get cost trends over time for model- Query params:
days(1-365, default: 30) - Time range for trend analysis - Response: Array of
CostTrendobjects with daily cost data - Use case: Cost optimization and pattern analysis
- Limits: Maximum 365 days of historical data
- Query params:
-
GET /api/metrics/costs/system/stats- Get system-wide cost statistics- Response: Total system cost, total requests, average cost per request, unique models, cost-bearing requests
- Use case: Financial reporting and system cost analysis
- Limits: System-wide aggregated data only
-
GET /api/metrics/costs/models/map- Get cost map for cross-repository integration- Response: Object with modelName → {totalCost, avgCost, attempts} mapping
- Use case: Internal cross-repository data integration (used by MetricsRepository)
- Limits: No limits
🔄 Data Consistency: All cost endpoints now return identical values for the same model (eliminated previous inconsistencies between UI components).
⚙️ Performance: Cost queries optimized with database indexes on (model_name, estimated_cost) and (created_at, estimated_cost, model_name).
-
GET /api/metrics/compare- Compare specific models on a dataset- Query params:
model1(required),model2(required),model3(optional),model4(optional),dataset(required) - Response:
ModelComparisonResultwith detailed puzzle-by-puzzle comparison - Data Structure:
{ summary: { totalPuzzles: number; model1Name: string; model2Name: string; model3Name?: string; model4Name?: string; dataset: string; allCorrect: number; // All models got it right allIncorrect: number; // All models got it wrong allNotAttempted: number; // No model tried threeCorrect?: number; // Exactly 3 correct (4-model comparison) twoCorrect?: number; // Exactly 2 correct oneCorrect?: number; // Exactly 1 correct model1OnlyCorrect: number; // Only model 1 correct model2OnlyCorrect: number; // Only model 2 correct model3OnlyCorrect?: number; model4OnlyCorrect?: number; // Attempt-union stats (used by /scoring) // If you compare two attempt-suffixed models of the same base model // (e.g. "some-model-attempt1" vs "some-model-attempt2"), the server returns // union metrics for the base model name. attemptUnionStats: Array<{ baseModelName: string; attemptModelNames: string[]; unionAccuracyPercentage: number; unionCorrectCount: number; totalPuzzles: number; totalTestPairs?: number; puzzlesCounted?: number; puzzlesFullySolved?: number; datasetTotalPuzzles?: number; datasetTotalTestPairs?: number; }>; }, details: PuzzleComparisonDetail[]; // Per-puzzle results }
- Use Case: Head-to-head model performance comparison on specific datasets
- Example:
/api/metrics/compare?model1=gpt-5-pro&model2=grok-4&dataset=evaluation2 - Limits: Up to 4 models simultaneously, any dataset from data/ directory
- Union puzzle IDs: For union scoring, clients can compute the solved puzzle/test-pair IDs by scanning
details[]and treating a puzzle/test pair as solved if any compared attempt iscorrect.
- Query params:
-
POST /api/puzzle/analyze-list- Analyze specific puzzles across ALL models- Body:
{ puzzleIds: string[] }- Array of puzzle IDs (max 500) - Response:
PuzzleListAnalysisResponsewith model-puzzle matrix - Data Structure:
{ modelPuzzleMatrix: Array<{ modelName: string; puzzleStatuses: Array<{ puzzleId: string; status: 'correct' | 'incorrect' | 'not_attempted'; }>; }>; puzzleResults: Array<{ puzzle_id: string; correct_models: string[]; total_attempts: number; }>; summary: { totalPuzzles: number; totalModels: number; perfectModels: number; // Got ALL puzzles correct partialModels: number; // Got some correct, some wrong notAttemptedModels: number; // Never tried any }; }
- Use Case: Check which models solved specific user-selected puzzles (inverse of model comparison)
- Limits: Max 500 puzzle IDs per request
- Body:
GET /api/puzzle/confidence-stats- Model confidence analysis- Limits: No limits
GET /api/puzzle/worst-performing- Identify problematic puzzles- Query params:
limit(max 500, increased from 50),sortBy, accuracy filters - 🔄 CHANGED: Maximum limit increased from 50 to 500 results
- Query params:
Worm Arena (LLM Snake) and the embedded SnakeBench backend expose a small, public API surface for running matches, listing replays, querying stats, and streaming tournaments:
POST /api/snakebench/run-match– Run a single Worm Arena match between two modelsPOST /api/snakebench/run-batch– Run a bounded batch of matches (smallcount)GET /api/snakebench/games//api/snakebench/games/:gameId– List games and fetch full replay JSONGET /api/snakebench/health– Embedded SnakeBench health check (Python/backend/runner)GET /api/snakebench/stats– Global Worm Arena stats (total games, active models, apples, total cost)GET /api/snakebench/model-rating//api/snakebench/model-history– Per-model TrueSkill snapshot + match historyGET /api/snakebench/leaderboard//api/snakebench/trueskill-leaderboard– Leaderboards for Worm Arena modelsGET /api/snakebench/greatest-hits– Curated list of “greatest hits” games (longest, most expensive, highest-scoring)POST /api/wormarena/prepare– Prepare live Worm Arena batch session (multi-opponent or legacy count-based)GET /api/wormarena/stream/:sessionId– SSE stream for live Worm Arena batches and single matches
All of these endpoints are public with no authentication, consistent with the rest of ARC Explainer. For detailed request/response schemas and SSE event types, see:
docs/reference/api/SnakeBench_WormArena_API.md
GET /api/puzzles/:puzzleId/solutions- Get community solutions for puzzlePOST /api/puzzles/:puzzleId/solutions- Submit community solutionPOST /api/solutions/:solutionId/vote- Vote on community solutionsGET /api/solutions/:solutionId/votes- Get solution vote counts
GET /api/prompts- Get available prompt templatesPOST /api/prompt-preview- Preview AI prompt before analysisPOST /api/prompt/preview/:provider/:taskId- Preview prompt for specific provider
Multi‑turn conversations with provider‑managed context retention using response IDs.
- Each analysis returns
providerResponseIdin the payload - Subsequent requests may include
previousResponseIdto continue the chain - Supported: OpenAI o‑series/GPT‑5 and xAI Grok‑4 (same‑provider chains only)
- Retention typically 30 days (when
store: true); new requests still consume tokens
Example request body:
{
"promptId": "solver",
"previousResponseId": "resp_abc123"
}Storage and indexing:
-- Stored on each explanation row
provider_response_id TEXT DEFAULT NULL;
-- Recommended indexes
CREATE INDEX IF NOT EXISTS idx_explanations_provider_response_id
ON explanations(provider_response_id) WHERE provider_response_id IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_explanations_created_recent
ON explanations(created_at DESC);GET /api/health/database- Database connection statusGET /api/admin/recovery-stats- Data recovery statisticsPOST /api/admin/recover-multiple-predictions- Recover missing prediction data
POST /api/puzzle/validate- Validate puzzle data structure
- Analytics endpoints: No longer return only top 10 results
- Performance stats: All trustworthiness, speed, and efficiency data returned
- Accuracy rankings: Complete model accuracy data available
- Feedback endpoint: Maximum limit increased from 1000 to 10000 results
- Worst-performing puzzles: Maximum limit increased from 50 to 500 results
- Batch results: Configurable limits up to 10000 results
- Puzzle list: Returns all puzzles without pagination by default
- Individual puzzle data: No limits on single puzzle fetches
- Model listings: No limits on available models
All API endpoints return JSON responses in this format:
{
"success": true,
"data": { /* response data */ },
"message": "Operation completed successfully",
"timestamp": "2025-01-01T00:00:00.000Z"
}Error responses:
{
"success": false,
"error": "Error description",
"details": "Additional error information",
"timestamp": "2025-01-01T00:00:00.000Z"
}interface PureAccuracyStats {
totalSolverAttempts: number;
totalCorrectPredictions: number;
overallAccuracyPercentage: number;
modelAccuracyRankings: ModelAccuracyRanking[]; // Now returns ALL models, not just 10
}
interface ModelAccuracyRanking {
modelName: string;
totalAttempts: number;
correctPredictions: number;
accuracyPercentage: number;
singleTestAttempts: number;
singleCorrectPredictions: number;
singleTestAccuracy: number;
multiTestAttempts: number;
multiCorrectPredictions: number;
multiTestAccuracy: number;
}interface PerformanceLeaderboards {
trustworthinessLeaders: Array<{ // Now returns ALL models, not just 10
modelName: string;
avgTrustworthiness: number;
avgConfidence: number;
avgProcessingTime: number;
avgCost: number;
totalCost: number;
}>;
speedLeaders: Array<{ // Now returns ALL models, not just 10
modelName: string;
avgProcessingTime: number;
totalAttempts: number;
avgTrustworthiness: number;
}>;
efficiencyLeaders: Array<{ // Now returns ALL models, not just 10
modelName: string;
costEfficiency: number;
tokenEfficiency: number;
avgTrustworthiness: number;
totalAttempts: number;
}>;
overallTrustworthiness: number;
}interface FeedbackStats {
totalFeedback: number;
helpfulPercentage: number;
topModels: Array<{
modelName: string;
feedbackCount: number;
helpfulCount: number;
notHelpfulCount: number;
helpfulPercentage: number;
}>;
feedbackByModel: Record<string, {
helpful: number;
notHelpful: number;
}>;
}-
GET /api/admin/quick-stats- Dashboard statistics- Response:
{ totalModels, totalExplanations, databaseConnected, lastIngestion, timestamp } - Use case: Admin Hub homepage quick stats
- Limits: No limits
- Response:
-
GET /api/admin/recent-activity- Recent ingestion activity- Response: Array of last 10 ingestion runs with stats
- Limits: Fixed at 10 most recent runs
-
POST /api/admin/validate-ingestion- Pre-flight validation before ingestion- Body:
{ datasetName, baseUrl } - Response: Validation result with checks (URL accessible, token present, DB connected, etc.)
- Use case: Validate configuration before starting ingestion
- Limits: No limits
- Body:
-
POST /api/admin/start-ingestion- Start HuggingFace dataset ingestion- Body:
{ datasetName, baseUrl, source, limit, delay, dryRun, forceOverwrite, verbose } - Response:
{ success, message, config }(202 Accepted - async operation) - Use case: Import external model predictions from HuggingFace datasets
- Limits: No limits
- Note: Returns immediately; ingestion runs in background
- Body:
-
GET /api/admin/ingestion-history- Complete ingestion run history- Response: Array of all ingestion runs with full details
- Limits: No limits - returns all historical runs
interface IngestionRun {
id: number;
datasetName: string;
baseUrl: string;
source: string; // ARC1-Eval, ARC2-Eval, etc.
totalPuzzles: number;
successful: number;
failed: number;
skipped: number;
durationMs: number;
dryRun: boolean;
accuracyPercent: number | null;
startedAt: string;
completedAt: string;
errorLog: string | null;
}Self-service platform for generating unique ARC evaluation datasets and scoring solver submissions. Contributed by David Lu (@conundrumer).
POST /api/rearc/generate- Generate unique 120-task evaluation dataset- Response: Streaming gzip JSON download
- Headers:
Content-Disposition: attachment; filename="re-arc_test_challenges-{timestamp}.json" - Rate Limit: 2 requests per 5 minutes
- Notes: Each request generates a cryptographically unique dataset. Task IDs encode the generation seed via XOR, enabling stateless verification without server-side storage.
POST /api/rearc/evaluate- Evaluate solver submission against generated dataset- Content-Type:
multipart/form-datawith JSON submission file - Response: Server-Sent Events stream
- Rate Limit: 20 requests per 5 minutes
- SSE Events:
progress-{ current: number, total: number }- Evaluation progresscomplete-{ type: "score", score: number }- Final score (0.0-1.0)complete-{ type: "mismatches", mismatches: [...] }- Test pair count mismatcheserror-{ message: string }- Validation or processing error
- Scoring: Uses official ARC Prize competition rules - pair solved if ANY of 2 attempts correct
- Caching: LRU cache provides ~100x speedup on repeated evaluations of same dataset
- Content-Type:
[
{ // Test Pair 0
"attempt_1": [[0, 1], [2, 3]],
"attempt_2": [[0, 1], [2, 3]]
},
{ // Test Pair 1
"attempt_1": [[4, 5]],
"attempt_2": [[4, 5]]
}
]- Task solutions are inaccessible without server-side
RE_ARC_SEED_PEPPERenvironment variable - HMAC-SHA256 seed derivation prevents dataset regeneration without server access
- Each production deployment should use a unique pepper value
All endpoints are publicly accessible with NO authentication required.
No explicit rate limiting implemented. Consider implementing for production use with external integrations.
The Saturn Visual Solver provides real-time updates via WebSockets:
- Connection endpoint:
ws://localhost:5000 - Event types:
progress,image-update,completion,error - Session-based communication using
sessionId
- Complete Data Access: Analytics endpoints now return complete datasets instead of arbitrary top-10 limits
- Higher Limits: Feedback and batch endpoints support much larger result sets
- Backward Compatibility: All existing query parameters continue to work
- Performance: Database queries have been optimized to handle larger result sets efficiently
- Database Dependency: Most endpoints require PostgreSQL connection. Fall back to in-memory mode if unavailable
- Token Tracking: API calls with AI models consume tokens and incur costs tracked in the database
POST /api/saturn/analyze/:taskId- Analyze puzzle with Saturn visual solverPOST /api/saturn/analyze-with-reasoning/:taskId- Saturn analysis with reasoning stepsGET /api/saturn/status/:sessionId- Get Saturn analysis progress- WebSocket: Real-time Saturn solver progress updates
GET /api/arc3-community/games- List approved and playable games (includes official ARCEngine games)GET /api/arc3-community/games/featured- Featured games for the ARC3 landing pageGET /api/arc3-community/games/:gameId- Fetch a single approved/playable game by IDPOST /api/arc3-community/session/start- Start a play session for a game (official or approved community)POST /api/arc3-community/session/:sessionGuid/action- Send an action to an active sessionPOST /api/arc3-community/submissions- Submit a single.pyfile for review (stored asstatus='pending', non-playable until approved)
These endpoints require a server-configured token:
- Server env var:
ARC3_COMMUNITY_ADMIN_TOKEN - Request header:
X-ARC3-Admin-Token: <token>(orAuthorization: Bearer <token>)
Endpoints:
GET /api/arc3-community/submissions?status=pending|approved|rejected|archived- List DB submissions (pending by default)GET /api/arc3-community/submissions/:submissionId/source- Fetch stored source for a submission (including pending)POST /api/arc3-community/submissions/:submissionId/publish- Approve a submission (setsstatus='approved',is_playable=true)POST /api/arc3-community/submissions/:submissionId/reject- Reject a submission (setsstatus='rejected',is_playable=false)