diff --git a/README.md b/README.md index 5ea086b7..85a95a8e 100644 --- a/README.md +++ b/README.md @@ -111,17 +111,18 @@ decoding, and `-n 256`. The short prompt is a normal small Italian story prompt. The long prompts exercise chunked prefill plus long-context decode. Q4 requires the larger-memory machine class, so M3 Max Q4 numbers are `N/A`. -| Machine | Quant | Prompt | Prefill | Generation | -| --- | ---: | ---: | ---: | ---: | -| MacBook Pro M3 Max, 128 GB | q2 | short | 58.52 t/s | 26.68 t/s | -| MacBook Pro M3 Max, 128 GB | q2 | 11709 tokens | 250.11 t/s | 21.47 t/s | -| MacBook Pro M3 Max, 128 GB | q4 | short | N/A | N/A | -| MacBook Pro M3 Max, 128 GB | q4 | long | N/A | N/A | -| Mac Studio M3 Ultra, 512 GB | q2 | short | 84.43 t/s | 36.86 t/s | -| Mac Studio M3 Ultra, 512 GB | q2 | 11709 tokens | 468.03 t/s | 27.39 t/s | -| Mac Studio M3 Ultra, 512 GB | q4 | short | 78.95 t/s | 35.50 t/s | -| Mac Studio M3 Ultra, 512 GB | q4 | 12018 tokens | 448.82 t/s | 26.62 t/s | -| DGX Spark GB10, 128 GB | q2 | 7047 tokens | 343.81 t/s | 13.75 t/s | +| Machine | Quant | Prompt | Prefill | Generation | +|-----------------------------|------:|-------------:|-----------:|-----------:| +| MacBook Pro M3 Max, 128 GB | q2 | short | 58.52 t/s | 26.68 t/s | +| MacBook Pro M3 Max, 128 GB | q2 | 11709 tokens | 250.11 t/s | 21.47 t/s | +| MacBook Pro M3 Max, 128 GB | q4 | short | N/A | N/A | +| MacBook Pro M3 Max, 128 GB | q4 | long | N/A | N/A | +| Mac Studio M3 Ultra, 512 GB | q2 | short | 84.43 t/s | 36.86 t/s | +| Mac Studio M3 Ultra, 512 GB | q2 | 11709 tokens | 468.03 t/s | 27.39 t/s | +| Mac Studio M3 Ultra, 512 GB | q4 | short | 78.95 t/s | 35.50 t/s | +| Mac Studio M3 Ultra, 512 GB | q4 | 12018 tokens | 448.82 t/s | 26.62 t/s | +| DGX Spark GB10, 128 GB | q2 | 7047 tokens | 343.81 t/s | 13.75 t/s | +| RTX 6000 PRO BW 600W, 96GB | q2 | short | 313.21 t/s | 35.66 t/s | ![M3 Max t/s](bench/m3_max_ts.svg)