ARC-AGI: Completely separate train/test examples at puzzle level by dywsy21 · Pull Request #22 · sapientinc/HRM

dywsy21 · 2025-08-04T08:41:29Z

What

An attempt to remove Test-Time Training and completely resolve the issue Data Leakage Bug: Model Trained on Full Dataset (Including Val/Test Splits) #18

Description

I saw #18 and was interested in how the model would behave in ARC-AGI if it only used puzzle inputs/outputs from train instead of also incorporating the inputs from test.

While I know that TTT is allowed in ARC-AGI, training on test examples beforehand does allow the model to have an unfair understanding of the implied rules used in them. It would be interesting to see how the H&L arch could figure out the implied rules it has not seen before, just like humans.

By removing TTT your model's evaluation result on ARC-AGI can be more convincing and more indicative of the model's actual generalization abilities. Let me know if this approach will help, happy to chat~

…vel for ARC-AGI

helma436 · 2025-08-04T11:01:01Z

.

shawntan · 2025-08-13T03:31:59Z

Does the TTT setting for ARC-AGI allow for parameter updates across evaluation examples?

If it doesn't then doing Training + TTT together represents a very different setting than Training -> TTT per evaluation instance right? Each evaluation instance would be iid in that case, and the model cannot use generalised information from the evaluation.

feat: attempt to completely separate train/test examples at puzzle le…

9fd1aba

…vel for ARC-AGI

D0CT4 approved these changes Sep 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARC-AGI: Completely separate train/test examples at puzzle level#22

ARC-AGI: Completely separate train/test examples at puzzle level#22
dywsy21 wants to merge 1 commit into
sapientinc:mainfrom
dywsy21:main

dywsy21 commented Aug 4, 2025 •

edited

Loading

Uh oh!

helma436 commented Aug 4, 2025

Uh oh!

shawntan commented Aug 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dywsy21 commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Description

Uh oh!

helma436 commented Aug 4, 2025

Uh oh!

shawntan commented Aug 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dywsy21 commented Aug 4, 2025 •

edited

Loading