Skip to content
#

tensor-parallelism

Here are 16 public repositories matching this topic...

vLLM - High-throughput, memory-efficient LLM inference engine with PagedAttention, continuous batching, CUDA/HIP optimization, quantization (GPTQ/AWQ/INT4/INT8/FP8), tensor/pipeline parallelism, OpenAI-compatible API, multi-GPU/TPU/Neuron support, prefix caching, and multi-LoRA capabilities

  • Updated Jan 26, 2026
  • Elixir

GPU Memory Calculator for LLM Training - Calculate GPU memory requirements for training Large Language Models with support for multiple training engines including PyTorch DDP, DeepSpeed ZeRO, Megatron-LM, and FSDP.

  • Updated Jan 26, 2026
  • Python

Improve this page

Add a description, image, and links to the tensor-parallelism topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tensor-parallelism topic, visit your repo's landing page and select "manage topics."

Learn more