LLM Performance Tuning

Reference

Ideas

https://x.com/athleticKoder/status/1979163202844754396

Techniques I’d master if I wanted to make LLMs faster + cheaper.

  1. Quantization
  2. KV-Cache Quantization
  3. Flash Attention
  4. Speculative Decoding
  5. LoRA
  6. Pruning
  7. Knowledge Distillation
  8. Weight Sharing
  9. Sparse Attention
  10. Batching & Dynamic Batching
  11. Model Serving Optimization
  12. Tensor Parallelism
  13. Pipeline Parallelism
  14. Paged Attention
  15. Mixed Precision Inference
  16. Early Exit / Token-Level Pruning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM Performance Tuning

Reference

Ideas

FilesExpand file tree

LLM_PerformanceTuning.md

Latest commit

History

LLM_PerformanceTuning.md

File metadata and controls

LLM Performance Tuning

Reference

Ideas