Hi, thanks for this very clean repo!
I had a question while trying to reproduce the decoder-only transformer Table 1 results.
In the current code, it looks like the provided decoder-only Transformer configs for BC, BESO, and RF all use plain TransformerEncoder blocks rather than AdaLN/X-Blocks:
Could you clarify whether these existing decoder-only Transformer models are the ones that produced Table 1's Decoder-Only Transformer results?
If not, would it be possible to share the decoder-only X-Block/AdaLN implementations and configs corresponding to the Table 1 X-BC, X-BESO, and X-RF results?
Thanks a lot.
Hi, thanks for this very clean repo!
I had a question while trying to reproduce the decoder-only transformer Table 1 results.
In the current code, it looks like the provided decoder-only Transformer configs for BC, BESO, and RF all use plain TransformerEncoder blocks rather than AdaLN/X-Blocks:
bc_dec_transformer.yaml, beso_dec_transformer.yaml, and fm_dec_transformer.yaml all point to
TransformerEncoderTransformerEncoder is built from plain Block layers, but the AdaLN/X-Block seems to be implemented in ConditionedBlock and used in TransformerFiLMEncoder. I don’t see these used in the decoder-only Transformer configs.
Could you clarify whether these existing decoder-only Transformer models are the ones that produced Table 1's Decoder-Only Transformer results?
If not, would it be possible to share the decoder-only X-Block/AdaLN implementations and configs corresponding to the Table 1 X-BC, X-BESO, and X-RF results?
Thanks a lot.