Skip to content

ocx-lab/Floyd-ARC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FloydARC: FloydNet for ARC-AGI

License

FloydARC adapts FloydNet to the ARC-AGI benchmark and achieves state-of-the-art performance among models trained primarily on ARC-style data (rather than large-scale web corpora).


1. Overview

The Abstraction and Reasoning Corpus (ARC) benchmark has attracted substantial interest in recent years. It evaluates a model’s ability to infer underlying rules from only a few examples, emphasizing reasoning and generalization. While large language models trained on massive internet data can achieve strong results, models trained mainly on ARC-style data face a significantly harder challenge. Prior work such as VARC and Loop-ViT shows that treating ARC as a vision-centric task can be highly effective.

FloydNet demonstrates strong performance on neural algorithmic reasoning. In this repository, we present FloydARC, a FloydNet-based system for ARC-AGI, achieving SOTA results among ARC-trained models.

Model #params ARC-AGI-1 ARC-AGI-2
large language models (LLMs)
Deepseek R1 671B 15.8 1.3
Claude 3.7 8k N/A 21.2 0.9
o3-mini-high N/A 34.5 3.0
GPT-5 N/A 44.0 1.9
Grok-4-thinking 1.7T 66.7 16.0
Bespoke (Grok-4) 1.7T 79.6 29.4
recurrent models
HRM 27M 40.3 5.0
TRM 7M 44.6 7.8
vision models
VARC 73M 60.4 11.1
Loop-ViT 11.2M 61.2 10.3
floydnet model
FloydARC (ours) 153.7M 70.5 15.3

For baselines (non-FloydARC), the reported ARC-AGI-1/2 numbers are taken from the public results summarized in VARC and Loop-ViT (see links above).


2. Method

FloydARC architecture. Inputs are the query canvas and a noised answer canvas; patch tokens are generated from linear patch embedding. Following FloydNet, supernodes augment tokens into a pairwise relative representation, which is refined by K looped pivotal-attention blocks and a prediction head to produce the predicted answer canvas.

Training data

We train on ARC-GEN and ARC-CDG:

We will release our preprocessed training data to Hugging Face hub.

Data augmentation

To improve generalization, we apply a set of on-the-fly augmentations, including rotations, flips, and color transforms. See:

  • far/augmenter.py
  • far/augment_op.py

Augmentations are applied during both training and inference.

Linear patchify

ARC requires pixel-level precision. Instead of convolution-based patchification commonly used in vision models, we use linear patchify: flatten the canvas and map it to patch embeddings with a linear layer.

Supernodes

We inject metadata (e.g., task id, augmentation type) as supernodes into the representation. Supernodes will attend to all patch tokens and vice versa.

Backbone: FloydNet

Following FloydNet, we initialize a pairwise relationship representation from pixel-level canvas features. We also inject metadata (e.g., task id, augmentation type) as supernodes into the representation.

Looped Update

We adopt a looped computation scheme: the same Pivital blocks are applied repeatedly to refine the hidden representation.

h = input_embedding(canvas)
for step in range(num_loops):
    for b in blocks:
        h = b(h)
output = output_head(h)

Diffusion-style denoising

We incorporate a diffusion-style process(DDPM) to improve generalization and multi-solution cases:

  • Training: input includes the query canvas and a noised answer canvas.
  • Inference: initialize the answer canvas with Gaussian noise, then iteratively denoise to produce the final output.

Test-Time Training (TTT)

At inference time, we perform lightweight training on each test task’s demo puzzles. During TTT, we periodically predict the test puzzle and finally apply max-voting over intermediate predictions.

We provide two TTT modes:

  • Full-model TTT: finetune all parameters.
  • LoRA TTT: finetune only low-rank adapters (LoRA), typically improving generalization while preserving pretrained knowledge. Empirically, LoRA TTT performs better than full-model TTT on ARC-AGI-1.

Detailed results

TTT mode ARC-AGI-1 ARC-AGI-2
Pass@1 Pass@2 Oracle Pass@1 Pass@2 Oracle
Full-model 60.4 65.5 83.5 6.9 8.6 22.2
LoRA 65.5 69.4 84.8 13.6 14.7 26.7
Ensemble 65.9 70.5 87.5 12.4 15.3 30.8

3. Installation (uv recommended)

This project targets Python >= 3.12.

3.1 Create an environment with uv

# Install uv (if needed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# From repo root:
uv venv --python 3.12
source .venv/bin/activate

# Install dependencies
uv sync

4. Inference with pretrained checkpoints

4.0 Expected directory structure

From the repo root, the expected layout looks like:

FloydARC/                          # repo root
├── rawdata/                       # place original ARC(-AGI) json files here (not tracked)
│   ├── ARC-AGI-1_evaluation/
│   │   └── **/*.json
│   ├── ARC-AGI-2_evaluation/
│   │   └── **/*.json
│   └── (optional) train_data/
│       └── **/*.json
├── preprocessed/                  # generated by scripts.process_data
│   ├── arc1/
│   │   └── test/                  # ARC-AGI-1 eval split outputs
│   ├── arc2/
│   │   └── test/                  # ARC-AGI-2 eval split outputs
│   └── arc-train/
│       └── train/                 # training split outputs
├── output/                        # generated by scripts.TTT / scripts.analyze
│   ├── TTT_results_ARC1/          # full-model TTT outputs (example)
│   ├── TTT_results_LoRA_ARC1/     # LoRA TTT outputs (example)
│   └── *.html                     # visualization reports (example)
└── (anywhere) checkpoints/
    └── floydarc_ckpt/             # downloaded checkpoint folder (pass via --ckpt_path)

Notes:

  • rawdata/ should contain the original dataset JSON files (the script scans **/*.json recursively).
  • preprocessed/ is fully generated by python -m scripts.process_data ... and can be safely deleted/regenerated.
  • output/ contains per-run predictions and the HTML visualization created by scripts.analyze.

4.1 Download checkpoints

Hugging Face hub: https://huggingface.co/ocxlabs/FloydARC

4.2 Prepare evaluation data

# Build ARC-AGI-1 evaluation data
python -m scripts.process_data \
  --input_dir ./rawdata/ARC-AGI-1_evaluation/ \
  --output_dir ./preprocessed/arc1 \
  --split test

# Build ARC-AGI-2 evaluation data
python -m scripts.process_data \
  --input_dir ./rawdata/ARC-AGI-2_evaluation/ \
  --output_dir ./preprocessed/arc2 \
  --split test

4.3 Run TTT inference (LoRA TTT + full-model TTT)

python -m scripts.TTT \
  --ckpt_path /path/to/downloaded/ckpt \
  --subset arc1 \
  --output_dir ./output/TTT_results

By default, the TTT script uses 8 GPUs on the current node. To use multiple nodes, write worker IPs to scripts/ip_list.txt before launching.

To evaluate ARC-AGI-2, set --subset arc2.

4.4 Ensembling & visualization

We provide a script to ensemble outputs (max-voting) and generate an HTML visualization.

# Analyze LoRA-TTT results
python -m scripts.analyze \
  --result-folder ./output/TTT_results_LoRA_ARC1 \
  --subset arc1 \
  --out-html output/arc1_lora.html

# Analyze full-model TTT results
python -m scripts.analyze \
  --result-folder ./output/TTT_results_ARC1 \
  --subset arc1 \
  --out-html output/arc1_full.html

# Ensemble both (max voting across folders)
python -m scripts.analyze \
  --result-folder ./output/TTT_results_ARC1 ./output/TTT_results_LoRA_ARC1 \
  --subset arc1 \
  --out-html output/arc1_ensemble.html

To evaluate ARC-AGI-2, set --subset arc2.


5. Training from scratch (distributed)

5.1 Prepare training data

process_data recursively scans JSON files under --input_dir and writes preprocessed outputs to --output_dir.

python -m scripts.process_data \
  --input_dir /path/to/train_data \
  --output_dir ./preprocessed/arc-train \
  --split train

5.2 Launch distributed training

To reproduce our training recipe, we recommend large-scale distributed training (e.g., 8 nodes / 64 GPUs).

./.venv/bin/torchrun \
  --master_addr $master_addr \
  --master_port $master_port \
  --nproc_per_node 8 \
  --nnodes $world_size \
  --node_rank $node_rank \
  -m scripts.run \
  --dataset arc-train \
  --wandb_log true \
  --run_name FloydARC1 \
  --compile true

6. Citation

If you find this repository useful, please cite:

@misc{floydarc2026,
  title   = {FloydARC: FloydNet for ARC-AGI},
  author  = {Jingcheng Yu, Xi Chen, Mingliang Zeng, Qiwei Ye},
  year    = {2026},
  url     = {https://github.com/ocx-lab/Floyd-ARC}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors