-
Notifications
You must be signed in to change notification settings - Fork 994
Description
I am investigating the generalization capabilities of the TRM architecture presented in "Less is More." While the paper claims the model learns to "tease out underlying task rules" through recursive refinement, recent independent analyses (and my own replication attempts) suggest the model is heavily reliant on the learned Task_ID embeddings rather than inferring logic from the input grid itself.
The Technical Issue
When the specific Task_ID is removed or randomized, the model's reasoning capabilities appear to collapse completely, suggesting it is performing conditional retrieval (lookup) rather than fluid intelligence.
Observed Behavior (Ablation Results):
Standard Input (Grid + Correct ID): ~45% Accuracy (Matches Paper)
Ablation Input (Grid + Blank/Random ID): 0.0% Accuracy
The Discrepancy
The paper asserts that the 7M parameter network is solving the ARC tasks via recursive refinement. However, if the model requires a unique, pre-learned embedding vector for every single task to achieve a score >0%, this indicates the "logic" is encoded in the embedding table (memory), not the recursive weights (reasoning).
Impact:
Parameter Count: The claim of "7M parameters" excludes the massive embedding table required to store these task-specific priors.
Generalization: A model that fails completely without a task-specific tag cannot be claimed to solve "unseen" tasks in a general sense, as it requires a learned index for that specific problem distribution.
Can you provide a checkpoint or a script where the model successfully solves any unseen puzzle without accessing the specific Task_ID embedding for that puzzle?
If not, how does this architecture differ from a learned lookup table?