Skip to content

Clarification on effective batch size in TabFlex training setup #23

@schnurrd

Description

@schnurrd

Hello,

thank you for the very interesting paper. I have a question regarding the training setup of TabFlex-S100, TabFlex-L100, and TabFlex-H1K.

In Appendix C.2 (Model Training), it is stated that the models were trained with batch sizes 1210, 110, and 1410 for 8, 4, and 4 epochs respectively. While experimenting with pre training, it seems that using such batch sizes would require significantly more GPU memory than the 80 GB A100 reported in the paper.

Am I missing something, or does the reported batch size correspond to the effective batch size, including gradient accumulation (batch_size × aggregate_k_gradients)? If so, I would be very interested in the concrete values used for batch_size and aggregate_k_gradients and the reasons for the overall very high batch_size value.

Thank you very much in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions