Model#HPUPrecisionInput LengthOutput LengthMax Token Sequence LengthThroughputLatency***BatchFramework Version
Falcon-7B1bf161008k8k116.9 token/sec8.55 ms1Optimum Habana 1.10.4
Bloom-7B-Greedy1bf16  2k721.41 token/sec11.08 ms8 
Bloom-7B-Greedy1fp8  2k192.20 token/sec5.2 ms1 
GPT-J (Text Generation)8bf16  100585.2 token/sec6.83 ms4DeepSpeed 0.12.4, Optimum Habana 1.10.4
LLaMA 2-7B1fp82k2k4k1604.44 token/sec49.86 ms12Optimum Habana 1.10.4
LLaMA 2-7B8fp84k4k8k5313.93 token/sec29.35 ms6Optimum Habana 1.10.4
LLaMA 2-7B8fp88k8k16k2648.5 token/sec21.52 ms3Optimum Habana 1.10.4
LLaMA 2-7B1bf161k3k4k411.31 token/sec9.72 ms4Optimum Habana 1.10.4
Falcon-40B8bf161002k2k84.33 token/sec11.85 ms1Optimum Habana 1.10.4
LLaMA 2-70B8fp82k2k4k5000.2 token/sec55.39 ms277DeepSpeed 0.12.4, Optimum Habana 1.10.4
LLaMA 2-70B8fp84k4k8k3171.51 token/sec33.1 ms77DeepSpeed 0.12.4, Optimum Habana 1.10.4
LLaMA 2-70B8fp88k8k16k1305.04 token/sec25.28 ms38DeepSpeed 0.12.4, Optimum Habana 1.10.4
LLaMA 2-70B8fp816k16k32k272.99 token/sec21.97 ms19DeepSpeed 0.12.4, Optimum Habana 1.10.4
LLaMA 2-70B8bf162k2k4k3348.89 token/sec64.49 ms216DeepSpeed 0.12.4, Optimum Habana 1.10.4
LLaMA 2-70B8bf162k6k8k1854.93 token/sec16.17 ms30DeepSpeed 0.12.4, Optimum Habana 1.10.4
LLaMA 2-70B8bf162k14k16k916.03 token/sec16.37 ms15DeepSpeed 0.12.4, Optimum Habana 1.10.4
LLaMA 2-13B1bf162k2k4k139.32 token/sec14.35 ms2Optimum Habana 1.10.4
Bloomz-176B8bf16  10036.76 token/sec27.2 ms1DeepSpeed 0.12.4, Optimum Habana 1.10.4
Bloom-176B-Greedy8fp8  4K201.09 token/sec39.78 ms8DeepSpeed 0.12.4
Bloom-176B-Greedy8bf16  4K398.04 token/sec52.75 ms21DeepSpeed 0.12.4
Bloom-176B-Greedy8bf16  8k197.78 token/sec50.56 ms10DeepSpeed 0.12.4
Bloom-176B-Greedy8bf16  16K84.42 token/sec47.38 ms4DeepSpeed 0.12.4
Bloom-176B-Greedy8bf16  32K26.04 token/sec38.4 ms1DeepSpeed 0.12.4
Bloom-176B-Sampling8bf16  1k19.6 token/sec51.02 ms1DeepSpeed 0.12.4
Bloom-176B-BeamSearch-88bf16  51230.83 token/sec32.43 ms1DeepSpeed 0.12.4
Falcon 180B8bf16   669.5 tokens/sec59.74 ms40Optimum Habana 1.10.4