Skip to content

ARC-AGI: Completely separate train/test examples at puzzle level#22

Open
dywsy21 wants to merge 1 commit into
sapientinc:mainfrom
dywsy21:main
Open

ARC-AGI: Completely separate train/test examples at puzzle level#22
dywsy21 wants to merge 1 commit into
sapientinc:mainfrom
dywsy21:main

Conversation

@dywsy21

@dywsy21 dywsy21 commented Aug 4, 2025

Copy link
Copy Markdown

What

Description

I saw #18 and was interested in how the model would behave in ARC-AGI if it only used puzzle inputs/outputs from train instead of also incorporating the inputs from test.

While I know that TTT is allowed in ARC-AGI, training on test examples beforehand does allow the model to have an unfair understanding of the implied rules used in them. It would be interesting to see how the H&L arch could figure out the implied rules it has not seen before, just like humans.

By removing TTT your model's evaluation result on ARC-AGI can be more convincing and more indicative of the model's actual generalization abilities. Let me know if this approach will help, happy to chat~

@helma436

helma436 commented Aug 4, 2025

Copy link
Copy Markdown

.

@shawntan

Copy link
Copy Markdown

Does the TTT setting for ARC-AGI allow for parameter updates across evaluation examples?

If it doesn't then doing Training + TTT together represents a very different setting than Training -> TTT per evaluation instance right? Each evaluation instance would be iid in that case, and the model cannot use generalised information from the evaluation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants