| Model | # HPU | Precision | Input Length | Output Length | Batch Size | Throughput (tokens/sec) |
|---|---|---|---|---|---|---|
| LLaMA 2 7b | 1 | fp8 | 128 | 128 | 1536 | 20381 |
| LLaMA 2 7b | 1 | fp8 | 128 | 2048 | 217 | 7476 |
| LLaMA 2 7b | 1 | fp8 | 2048 | 128 | 153 | 2137 |
| LLaMA 2 7b | 1 | fp8 | 2048 | 2048 | 117 | 2977 |
| LLaMA 2 70b | 2 | fp8 | 128 | 128 | 1750 | 4562 |
| LLaMA 2 70b | 2 | fp8 | 128 | 2048 | 512 | 6590 |
| LLaMA 2 70b | 2 | fp8 | 2048 | 128 | 242 | 486 |
| LLaMA 2 70b | 2 | fp8 | 2048 | 2048 | 241 | 2736 |
| LLaMA 3.1 8B | 1 | fp8 | 128 | 128 | 1536 | 24364 |
| LLaMA 3.1 8B | 1 | fp8 | 128 | 2048 | 768 | 18063 |
| LLaMA 3.1 8B | 1 | fp8 | 2048 | 128 | 256 | 2590 |
| LLaMA 3.1 8B | 1 | fp8 | 2048 | 2048 | 371 | 8335 |
| LLaMA 3.1 70B | 2 | fp8 | 128 | 128 | 2048 | 4562 |
| LLaMA 3.1 70B | 2 | fp8 | 128 | 2048 | 450 | 6278 |
| LLaMA 3.1 70B | 2 | fp8 | 2048 | 128 | 223 | 499 |
| LLaMA 3.1 70B | 2 | fp8 | 2048 | 2048 | 175 | 2796 |
| LLaMA 3.1 70B | 8 | fp8 | 128 | 128 | 4000 | 15377 |
| LLaMA 3.1 70B | 8 | fp8 | 128 | 2048 | 600 | 16891 |
| LLaMA 3.1 70B | 8 | fp8 | 2048 | 128 | 512 | 1594 |
| LLaMA 3.1 70B | 8 | fp8 | 2048 | 2048 | 600 | 9467 |
| LLaMA 3.1 405B | 8 | fp8 | 128 | 128 | 2996 | 3306 |
| LLaMA 3.1 405B | 8 | fp8 | 128 | 2048 | 460 | 4793 |
| LLaMA 3.1 405B | 8 | fp8 | 2048 | 128 | 195 | 371 |
| LLaMA 3.1 405B | 8 | fp8 | 2048 | 2048 | 180 | 2143 |
| Mixtral 8x7B | 2 | bf16 | 128 | 128 | 13722 | 15904 |
| Mixtral 8x7B | 2 | bf16 | 128 | 2048 | 8496 | 7660 |
| Mixtral 8x7B | 2 | bf16 | 2048 | 128 | 1641 | 1896 |
| Mixtral 8x7B | 2 | bf16 | 2048 | 2048 | 4014 | 3643 |