| Max Throughput [TpS - higher is better] | ||||||
|---|---|---|---|---|---|---|
| Model | # HPU | Precision | Input Length | Output Length | Batch Size | Throughput (tokens/sec) |
| LLaMA 2 7b | 1 | fp8 | 128 | 128 | 1230 | 13583 |
| LLaMA 2 7b | 1 | fp8 | 128 | 2048 | 163 | 4802 |
| LLaMA 2 7b | 1 | fp8 | 2048 | 128 | 94 | 1447 |
| LLaMA 2 7b | 1 | fp8 | 2048 | 2048 | 81 | 1956 |
| LLaMA 2 70b | 2 | fp8 | 128 | 128 | 1750 | 2943 |
| LLaMA 2 70b | 2 | fp8 | 128 | 2048 | 327 | 3312 |
| LLaMA 2 70b | 2 | fp8 | 2048 | 128 | 95 | 316 |
| LLaMA 2 70b | 2 | fp8 | 2048 | 2048 | 159 | 1755 |
| LLaMA 3.1 8B | 1 | fp8 | 128 | 128 | 2816 | 19875 |
| LLaMA 3.1 8B | 1 | fp8 | 128 | 2048 | 512 | 14784 |
| LLaMA 3.1 8B | 1 | fp8 | 2048 | 128 | 179 | 2011 |
| LLaMA 3.1 8B | 1 | fp8 | 2048 | 2048 | 256 | 6083 |
| LLaMA 3.1 70B | 2 | fp8 | 128 | 128 | 1792 | 2895 |
| LLaMA 3.1 70B | 2 | fp8 | 128 | 2048 | 256 | 3816 |
| LLaMA 3.1 70B | 2 | fp8 | 2048 | 128 | 142 | 316 |
| LLaMA 3.1 70B | 2 | fp8 | 2048 | 2048 | 139 | 1648 |
| LLaMA 3.1 70B | 8 | fp8 | 128 | 128 | 4000 | 10012 |
| LLaMA 3.1 70B | 8 | fp8 | 128 | 2048 | 600 | 12538 |
| LLaMA 3.1 70B | 8 | fp8 | 2048 | 128 | 383 | 1083 |
| LLaMA 3.1 70B | 8 | fp8 | 2048 | 2048 | 476 | 6623 |