FloydARC: FloydNet for ARC-AGI

FloydARC adapts FloydNet to the ARC-AGI benchmark and achieves state-of-the-art performance among models trained primarily on ARC-style data (rather than large-scale web corpora).

1. Overview

The Abstraction and Reasoning Corpus (ARC) benchmark has attracted substantial interest in recent years. It evaluates a model’s ability to infer underlying rules from only a few examples, emphasizing reasoning and generalization. While large language models trained on massive internet data can achieve strong results, models trained mainly on ARC-style data face a significantly harder challenge. Prior work such as VARC and Loop-ViT shows that treating ARC as a vision-centric task can be highly effective.

FloydNet demonstrates strong performance on neural algorithmic reasoning. In this repository, we present FloydARC, a FloydNet-based system for ARC-AGI, achieving SOTA results among ARC-trained models.

Model	#params	ARC-AGI-1	ARC-AGI-2
large language models (LLMs)
Deepseek R1	671B	15.8	1.3
Claude 3.7 8k	N/A	21.2	0.9
o3-mini-high	N/A	34.5	3.0
GPT-5	N/A	44.0	1.9
Grok-4-thinking	1.7T	66.7	16.0
Bespoke (Grok-4)	1.7T	79.6	29.4
recurrent models
HRM	27M	40.3	5.0
TRM	7M	44.6	7.8
vision models
VARC	73M	60.4	11.1
Loop-ViT	11.2M	61.2	10.3
floydnet model
FloydARC (ours)	153.7M	70.5	15.3

For baselines (non-FloydARC), the reported ARC-AGI-1/2 numbers are taken from the public results summarized in VARC and Loop-ViT (see links above).

2. Method

FloydARC architecture. Inputs are the query canvas and a noised answer canvas; patch tokens are generated from linear patch embedding. Following FloydNet, supernodes augment tokens into a pairwise relative representation, which is refined by K looped pivotal-attention blocks and a prediction head to produce the predicted answer canvas.

Training data

We train on ARC-GEN and ARC-CDG:

ARC-GEN: tasks aligned with the original ARC-AGI-1 training set
https://github.com/google/ARC-GEN
ARC-CDG: a collection of more primitive, compositional operation tasks
https://github.com/Poolminer/ARC-CDG

We will release our preprocessed training data to Hugging Face hub.

Data augmentation

To improve generalization, we apply a set of on-the-fly augmentations, including rotations, flips, and color transforms. See:

far/augmenter.py
far/augment_op.py

Augmentations are applied during both training and inference.

Linear patchify

ARC requires pixel-level precision. Instead of convolution-based patchification commonly used in vision models, we use linear patchify: flatten the canvas and map it to patch embeddings with a linear layer.

Supernodes

We inject metadata (e.g., task id, augmentation type) as supernodes into the representation. Supernodes will attend to all patch tokens and vice versa.

Backbone: FloydNet

Following FloydNet, we initialize a pairwise relationship representation from pixel-level canvas features. We also inject metadata (e.g., task id, augmentation type) as supernodes into the representation.

Looped Update

We adopt a looped computation scheme: the same Pivital blocks are applied repeatedly to refine the hidden representation.

h = input_embedding(canvas)
for step in range(num_loops):
    for b in blocks:
        h = b(h)
output = output_head(h)

Diffusion-style denoising

We incorporate a diffusion-style process(DDPM) to improve generalization and multi-solution cases:

Training: input includes the query canvas and a noised answer canvas.
Inference: initialize the answer canvas with Gaussian noise, then iteratively denoise to produce the final output.

Test-Time Training (TTT)

At inference time, we perform lightweight training on each test task’s demo puzzles. During TTT, we periodically predict the test puzzle and finally apply max-voting over intermediate predictions.

We provide two TTT modes:

Full-model TTT: finetune all parameters.
LoRA TTT: finetune only low-rank adapters (LoRA), typically improving generalization while preserving pretrained knowledge. Empirically, LoRA TTT performs better than full-model TTT on ARC-AGI-1.

Detailed results

TTT mode	ARC-AGI-1			ARC-AGI-2
	Pass@1	Pass@2	Oracle	Pass@1	Pass@2	Oracle
Full-model	60.4	65.5	83.5	6.9	8.6	22.2
LoRA	65.5	69.4	84.8	13.6	14.7	26.7
Ensemble	65.9	70.5	87.5	12.4	15.3	30.8

3. Installation (uv recommended)

This project targets Python >= 3.12.

3.1 Create an environment with uv

# Install uv (if needed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# From repo root:
uv venv --python 3.12
source .venv/bin/activate

# Install dependencies
uv sync

4. Inference with pretrained checkpoints

4.0 Expected directory structure

From the repo root, the expected layout looks like:

FloydARC/                          # repo root
├── rawdata/                       # place original ARC(-AGI) json files here (not tracked)
│   ├── ARC-AGI-1_evaluation/
│   │   └── **/*.json
│   ├── ARC-AGI-2_evaluation/
│   │   └── **/*.json
│   └── (optional) train_data/
│       └── **/*.json
├── preprocessed/                  # generated by scripts.process_data
│   ├── arc1/
│   │   └── test/                  # ARC-AGI-1 eval split outputs
│   ├── arc2/
│   │   └── test/                  # ARC-AGI-2 eval split outputs
│   └── arc-train/
│       └── train/                 # training split outputs
├── output/                        # generated by scripts.TTT / scripts.analyze
│   ├── TTT_results_ARC1/          # full-model TTT outputs (example)
│   ├── TTT_results_LoRA_ARC1/     # LoRA TTT outputs (example)
│   └── *.html                     # visualization reports (example)
└── (anywhere) checkpoints/
    └── floydarc_ckpt/             # downloaded checkpoint folder (pass via --ckpt_path)

Notes:

rawdata/ should contain the original dataset JSON files (the script scans **/*.json recursively).
preprocessed/ is fully generated by python -m scripts.process_data ... and can be safely deleted/regenerated.
output/ contains per-run predictions and the HTML visualization created by scripts.analyze.

4.1 Download checkpoints

Hugging Face hub: https://huggingface.co/ocxlabs/FloydARC

4.2 Prepare evaluation data

# Build ARC-AGI-1 evaluation data
python -m scripts.process_data \
  --input_dir ./rawdata/ARC-AGI-1_evaluation/ \
  --output_dir ./preprocessed/arc1 \
  --split test

# Build ARC-AGI-2 evaluation data
python -m scripts.process_data \
  --input_dir ./rawdata/ARC-AGI-2_evaluation/ \
  --output_dir ./preprocessed/arc2 \
  --split test

4.3 Run TTT inference (LoRA TTT + full-model TTT)

python -m scripts.TTT \
  --ckpt_path /path/to/downloaded/ckpt \
  --subset arc1 \
  --output_dir ./output/TTT_results

By default, the TTT script uses 8 GPUs on the current node. To use multiple nodes, write worker IPs to scripts/ip_list.txt before launching.

To evaluate ARC-AGI-2, set --subset arc2.

4.4 Ensembling & visualization

We provide a script to ensemble outputs (max-voting) and generate an HTML visualization.

# Analyze LoRA-TTT results
python -m scripts.analyze \
  --result-folder ./output/TTT_results_LoRA_ARC1 \
  --subset arc1 \
  --out-html output/arc1_lora.html

# Analyze full-model TTT results
python -m scripts.analyze \
  --result-folder ./output/TTT_results_ARC1 \
  --subset arc1 \
  --out-html output/arc1_full.html

# Ensemble both (max voting across folders)
python -m scripts.analyze \
  --result-folder ./output/TTT_results_ARC1 ./output/TTT_results_LoRA_ARC1 \
  --subset arc1 \
  --out-html output/arc1_ensemble.html

To evaluate ARC-AGI-2, set --subset arc2.

5. Training from scratch (distributed)

5.1 Prepare training data

process_data recursively scans JSON files under --input_dir and writes preprocessed outputs to --output_dir.

python -m scripts.process_data \
  --input_dir /path/to/train_data \
  --output_dir ./preprocessed/arc-train \
  --split train

5.2 Launch distributed training

To reproduce our training recipe, we recommend large-scale distributed training (e.g., 8 nodes / 64 GPUs).

./.venv/bin/torchrun \
  --master_addr $master_addr \
  --master_port $master_port \
  --nproc_per_node 8 \
  --nnodes $world_size \
  --node_rank $node_rank \
  -m scripts.run \
  --dataset arc-train \
  --wandb_log true \
  --run_name FloydARC1 \
  --compile true

6. Citation

If you find this repository useful, please cite:

@misc{floydarc2026,
  title   = {FloydARC: FloydNet for ARC-AGI},
  author  = {Jingcheng Yu, Xi Chen, Mingliang Zeng, Qiwei Ye},
  year    = {2026},
  url     = {https://github.com/ocx-lab/Floyd-ARC}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
far		far
misc		misc
rawdata		rawdata
scripts		scripts
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FloydARC: FloydNet for ARC-AGI

1. Overview

2. Method

Training data

Data augmentation

Linear patchify

Supernodes

Backbone: FloydNet

Looped Update

Diffusion-style denoising

Test-Time Training (TTT)

Detailed results

3. Installation (uv recommended)

3.1 Create an environment with uv

4. Inference with pretrained checkpoints

4.0 Expected directory structure

4.1 Download checkpoints

4.2 Prepare evaluation data

4.3 Run TTT inference (LoRA TTT + full-model TTT)

4.4 Ensembling & visualization

5. Training from scratch (distributed)

5.1 Prepare training data

5.2 Launch distributed training

6. Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FloydARC: FloydNet for ARC-AGI

1. Overview

2. Method

Training data

Data augmentation

Linear patchify

Supernodes

Backbone: FloydNet

Looped Update

Diffusion-style denoising

Test-Time Training (TTT)

Detailed results

3. Installation (uv recommended)

3.1 Create an environment with uv

4. Inference with pretrained checkpoints

4.0 Expected directory structure

4.1 Download checkpoints

4.2 Prepare evaluation data

4.3 Run TTT inference (LoRA TTT + full-model TTT)

4.4 Ensembling & visualization

5. Training from scratch (distributed)

5.1 Prepare training data

5.2 Launch distributed training

6. Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages