Hybrid Mamba-2 + Transformer 2.94B LLM (Nemotron-H style) — Korean 3B model pretrained from scratch on 7× NVIDIA B200 GPUs with SFT + DPO alignment
-
Updated
Mar 26, 2026 - Python
Hybrid Mamba-2 + Transformer 2.94B LLM (Nemotron-H style) — Korean 3B model pretrained from scratch on 7× NVIDIA B200 GPUs with SFT + DPO alignment
Hybrid SSM-Attention language model on Apple Silicon with MLX — interleaving Mamba-2 and Transformer for efficient inference
Add a description, image, and links to the mamba2 topic page so that developers can more easily learn about it.
To associate your repository with the mamba2 topic, visit your repo's landing page and select "manage topics."