- https://x.com/athleticKoder/status/1979163202844754396
- Techniques I’d master if I wanted to make LLMs faster + cheaper.
1. Quantization 2. KV-Cache Quantization 3. Flash Attention 4. Speculative Decoding 5. LoRA 6. Pruning 7. Knowledge Distillation 8. Weight Sharing 9. Sparse Attention 10. Batching & Dynamic Batching 11. Model Serving Optimization 12. Tensor Parallelism 13. Pipeline Parallelism 14. Paged Attention 15. Mixed Precision Inference 16. Early Exit / Token-Level Pruning