Tall-and-Skinny and Short-and-Wide Optimizations for QR and LQ Decompositions

ID 660325
Updated 5/21/2017
Version Latest



Intel® oneAPI Math Kernel Library (oneMKL) 2017 updates 3 and later versions provide optimized functionality for calculating QR decompositions of tall-and-skinny (TS) matrices, and for calculating LQ decompositions of short-and-wide (SW) matrices.

New routines have been added to oneMKL to allow for the calculations of QR and LQ factorizations using the TS/SW modifications described above for appropriate matrix sizes. These routines are generalized for all sizes (i.e. they will also work on matrices that are not TS/SW, as they include paths to return to the generic routines when the matrix size is not sufficiently TS/SW). Details of the new routines and parameter specifications can be found in the oneMKL Developer Reference. The routines to reference are listed below:

New TS/SW Routine

Generic Routines

QR Decomposition

  • ?geqr
  • ?gemqr



LQ Decomposition

  • ?gelq
  • ?gemlq

QR Decomposition

  • ?geqrf
  • ?ormqr (real)
  • ?unmqr (complex)


LQ Decomposition

  • ?gelqf
  • ?ormlq (real)
  • ?unmlq (complex)


    A general overview of the TSQR algorithm is provided into TSKB_QRLQ.pdf file attached. In addition, this pdf provides example code to call the QR decomposition of a matrix using the new TSQR routines.

    The following charts show the speedup of DGEQR compared to DGEQRF. Performance results of ?GELQ compared to ?GELQF routines show similar speedup, thus are not displayed here

The first chart shows these speedups on an Intel® Xeon® CPU E5-2699 v4 processor, 


 and the second on an Intel® Xeon Phi™ 7250 processor.