Skip to content

test offloading optimizer state to CPU on GH200 nodes #11

@tomiock

Description

@tomiock

Using the very fast CPU-GPU ram in the JUPITER nodes we can try to offload the optimizer states to do it. There should be a latency penalty but not so severe as with regular x86 nodes.

Perhaps can only be done via FSDP.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions