Decoder-only Transformer models missing X-Blocks/AdaLN?

Hi, thanks for this very clean repo!

I had a question while trying to reproduce the decoder-only transformer Table 1 results. 

In the current code, it looks like the provided decoder-only Transformer configs for BC, BESO, and RF all use plain TransformerEncoder blocks rather than AdaLN/X-Blocks:

* [bc_dec_transformer.yaml](https://github.com/maxrudolph1/X_IL/blob/34bb02eab91b9de317bbb7cb38e7d3bf8f846589/configs/agents/model/bc/bc_dec_transformer.yaml#L4-L22), [beso_dec_transformer.yaml](https://github.com/maxrudolph1/X_IL/blob/34bb02eab91b9de317bbb7cb38e7d3bf8f846589/configs/agents/model/beso/beso_dec_transformer.yaml#L6-L28), and [fm_dec_transformer.yaml](https://github.com/maxrudolph1/X_IL/blob/34bb02eab91b9de317bbb7cb38e7d3bf8f846589/configs/agents/model/fm/fm_dec_transformer.yaml#L7-L27) all point to `TransformerEncoder`

* [TransformerEncoder](https://github.com/maxrudolph1/X_IL/blob/34bb02eab91b9de317bbb7cb38e7d3bf8f846589/agents/backbones/transformer/blocks.py#L262-L299) is built from plain [Block](https://github.com/maxrudolph1/X_IL/blob/34bb02eab91b9de317bbb7cb38e7d3bf8f846589/agents/backbones/transformer/blocks.py#L162-L193) layers, but the AdaLN/X-Block seems to be implemented in [ConditionedBlock](https://github.com/maxrudolph1/X_IL/blob/34bb02eab91b9de317bbb7cb38e7d3bf8f846589/agents/backbones/transformer/blocks.py#L219-L259) and used in [TransformerFiLMEncoder](https://github.com/maxrudolph1/X_IL/blob/34bb02eab91b9de317bbb7cb38e7d3bf8f846589/agents/backbones/transformer/blocks.py#L302-L339). I don’t see these used in the decoder-only Transformer configs.

Could you clarify whether these existing decoder-only Transformer models are the ones that produced Table 1's Decoder-Only Transformer results?

If not, would it be possible to share the decoder-only X-Block/AdaLN implementations and configs corresponding to the Table 1 X-BC, X-BESO, and X-RF results?

Thanks a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoder-only Transformer models missing X-Blocks/AdaLN? #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Decoder-only Transformer models missing X-Blocks/AdaLN? #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions