| Falcon-7B | 1 | bf16 | 100 | 8k | 8k | 116.9 token/sec | 8.55 ms | 1 | Optimum Habana 1.10.4 |
| Bloom-7B-Greedy | 1 | bf16 | | | 2k | 721.41 token/sec | 11.08 ms | 8 | |
| Bloom-7B-Greedy | 1 | fp8 | | | 2k | 192.20 token/sec | 5.2 ms | 1 | |
| GPT-J (Text Generation) | 8 | bf16 | | | 100 | 585.2 token/sec | 6.83 ms | 4 | DeepSpeed 0.12.4, Optimum Habana 1.10.4 |
| LLaMA 2-7B | 1 | fp8 | 2k | 2k | 4k | 1604.44 token/sec | 49.86 ms | 12 | Optimum Habana 1.10.4 |
| LLaMA 2-7B | 8 | fp8 | 4k | 4k | 8k | 5313.93 token/sec | 29.35 ms | 6 | Optimum Habana 1.10.4 |
| LLaMA 2-7B | 8 | fp8 | 8k | 8k | 16k | 2648.5 token/sec | 21.52 ms | 3 | Optimum Habana 1.10.4 |
| LLaMA 2-7B | 1 | bf16 | 1k | 3k | 4k | 411.31 token/sec | 9.72 ms | 4 | Optimum Habana 1.10.4 |
| Falcon-40B | 8 | bf16 | 100 | 2k | 2k | 84.33 token/sec | 11.85 ms | 1 | Optimum Habana 1.10.4 |
| LLaMA 2-70B | 8 | fp8 | 2k | 2k | 4k | 5000.2 token/sec | 55.39 ms | 277 | DeepSpeed 0.12.4, Optimum Habana 1.10.4 |
| LLaMA 2-70B | 8 | fp8 | 4k | 4k | 8k | 3171.51 token/sec | 33.1 ms | 77 | DeepSpeed 0.12.4, Optimum Habana 1.10.4 |
| LLaMA 2-70B | 8 | fp8 | 8k | 8k | 16k | 1305.04 token/sec | 25.28 ms | 38 | DeepSpeed 0.12.4, Optimum Habana 1.10.4 |
| LLaMA 2-70B | 8 | fp8 | 16k | 16k | 32k | 272.99 token/sec | 21.97 ms | 19 | DeepSpeed 0.12.4, Optimum Habana 1.10.4 |
| LLaMA 2-70B | 8 | bf16 | 2k | 2k | 4k | 3348.89 token/sec | 64.49 ms | 216 | DeepSpeed 0.12.4, Optimum Habana 1.10.4 |
| LLaMA 2-70B | 8 | bf16 | 2k | 6k | 8k | 1854.93 token/sec | 16.17 ms | 30 | DeepSpeed 0.12.4, Optimum Habana 1.10.4 |
| LLaMA 2-70B | 8 | bf16 | 2k | 14k | 16k | 916.03 token/sec | 16.37 ms | 15 | DeepSpeed 0.12.4, Optimum Habana 1.10.4 |
| LLaMA 2-13B | 1 | bf16 | 2k | 2k | 4k | 139.32 token/sec | 14.35 ms | 2 | Optimum Habana 1.10.4 |
| Bloomz-176B | 8 | bf16 | | | 100 | 36.76 token/sec | 27.2 ms | 1 | DeepSpeed 0.12.4, Optimum Habana 1.10.4 |
| Bloom-176B-Greedy | 8 | fp8 | | | 4K | 201.09 token/sec | 39.78 ms | 8 | DeepSpeed 0.12.4 |
| Bloom-176B-Greedy | 8 | bf16 | | | 4K | 398.04 token/sec | 52.75 ms | 21 | DeepSpeed 0.12.4 |
| Bloom-176B-Greedy | 8 | bf16 | | | 8k | 197.78 token/sec | 50.56 ms | 10 | DeepSpeed 0.12.4 |
| Bloom-176B-Greedy | 8 | bf16 | | | 16K | 84.42 token/sec | 47.38 ms | 4 | DeepSpeed 0.12.4 |
| Bloom-176B-Greedy | 8 | bf16 | | | 32K | 26.04 token/sec | 38.4 ms | 1 | DeepSpeed 0.12.4 |
| Bloom-176B-Sampling | 8 | bf16 | | | 1k | 19.6 token/sec | 51.02 ms | 1 | DeepSpeed 0.12.4 |
| Bloom-176B-BeamSearch-8 | 8 | bf16 | | | 512 | 30.83 token/sec | 32.43 ms | 1 | DeepSpeed 0.12.4 |
| Falcon 180B | 8 | bf16 | | | | 669.5 tokens/sec | 59.74 ms | 40 | Optimum Habana 1.10.4 |