​​​​​​​​​​​​​​​​
Model#HPUPrecisionInput LengthOutput LengthThroughputBatchFramework VersionMeasurement InstructionsRun Instructions
LLaMA 2 7B1fp812812812772 tokens/sec1230Optimum Habana 1.12.1
LLaMA 2 7B1fp812820484787 tokens/sec163Optimum Habana 1.12.1
LLaMA 2 7B1fp820481281318 tokens/sec94Optimum Habana 1.12.1
LLaMA 2 7B1fp8204820481967 tokens/sec81Optimum Habana 1.12.1
LLaMA 3 8B1fp812812817331 tokens/sec2429Optimum Habana 1.12.1
LLaMA 3 8B1fp8128204811106 tokens/sec289Optimum Habana 1.12.1
LLaMA 3 8B1fp820481281762 tokens/sec179Optimum Habana 1.12.1
LLaMA 3 8B1fp8204820485379 tokens/sec155Optimum Habana 1.12.1
LLaMA 2 70B2fp81281282784 tokens/sec1750DeepSpeed 0.14.0, Optimum Habana 1.12.1
LLaMA 2 70B2fp812820483186 tokens/sec750DeepSpeed 0.14.0, Optimum Habana 1.12.1
LLaMA 2 70B2fp82048128292 tokens/sec95DeepSpeed 0.14.0, Optimum Habana 1.12.1
LLaMA 2 70B2fp8204820481392 tokens/sec78DeepSpeed 0.14.0, Optimum Habana 1.12.1
Mistral 7B1fp812812813112 tokens/sec896Optimum Habana 1.12.1
Mistral 7B1fp812820487947 tokens/sec120Optimum Habana 1.12.1
Mistral 7B1fp820481281360 tokens/sec120Optimum Habana 1.12.1
Mistral 7B1fp8204820483143 tokens/sec44Optimum Habana 1.12.1