The Small Matrix Library (SML) is an implementation of common matrix and vector operations specifically optimized for Intel® Pentium® II and Intel® Pentium® III architecture. The library includes C++ classes for matrices and vectors with sizes varying from 1x1 to 6x6; lowlevel functions from the library may be used directly in programs written in the C language. C++ classes include overloaded functions for such common operations as addition, subtraction, multiplication, etc. with very low overhead.
SML combines considerable performance gains with a convenient user interface. In particular, multiplication of two 6x6 matrices is executed in 652 processor cycles on the Intel® Pentium® II processor and 307 cycles on the Intel® Pentium® III processor. Among other examples, inversion of 6x6 matrix on Intel® Pentium® III requires 625 cycles and LU decomposition of 30x30 matrix can be executed in less than 70K cycles.
The library requires Microsoft Visual* C++ 5.x or later compiler and Intel® C/C++ Compiler version 4.0 or later to build the Intel® Pentium® III version of the library.

Application notes The following set of application notes details how to improve the performance of small size matrix operations using the Intel® Pentium® III processor.

AP928 Streaming SIMD Extensions Inverse of 4x4 Matrix 
AP929 Streaming SIMD Extensions  Inverse of 6x6 Matrix 
AP930 Streaming SIMD Extensions  Matrix Multiplication 
AP931 Streaming SIMD Extensions  LU Decomposition

Articulated body demo 
Download an articulated body demo (.ZIP file, 10.28MB) and its accompanying video file to view a realtime application which uses the Intel® Pentium® III processor to speed up a physical simulation of a velociraptor exploring its environment.
For more documentation on Streaming SIMD Extensions, visit the Intel® Software College website.

