Developer Reference for Intel® oneAPI Math Kernel Library for Fortran

ID 766686
Date 11/07/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

p?trord

Reorders the Schur factorization of a general matrix.

Syntax

call pstrord( compq, select, para, n, t, it, jt, desct, q, iq, jq, descq, wr, wi, m, work, lwork, iwork, liwork, info )

call pdtrord( compq, select, para, n, t, it, jt, desct, q, iq, jq, descq, wr, wi, m, work, lwork, iwork, liwork, info )

Description

p?trord reorders the real Schur factorization of a real matrix A = Q*T*QT, so that a selected cluster of eigenvalues appears in the leading diagonal blocks of the upper quasi-triangular matrix T, and the leading columns of Q form an orthonormal basis of the corresponding right invariant subspace.

T must be in Schur form (as returned by p?lahqr), that is, block upper triangular with 1-by-1 and 2-by-2 diagonal blocks.

This subroutine uses a delay and accumulate procedure for performing the off-diagonal updates.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201

Input Parameters

compq

(global) CHARACTER*1

= 'V': update the matrix q of Schur vectors;

= 'N': do not update q.

select

(global) INTEGER array of size n

select specifies the eigenvalues in the selected cluster. To select a real eigenvalue w(j), select(j) must be set to 1. To select a complex conjugate pair of eigenvalues w(j) and w(j+1), corresponding to a 2-by-2 diagonal block, either select(j) or select(j+1) or both must be set to 1; a complex conjugate pair of eigenvalues must be either both included in the cluster or both excluded.

para

(global) INTEGER*6

Block parameters:

para(1)

maximum number of concurrent computational windows allowed in the algorithm; 0 < para(1) min(nprow, npcol) must hold;

para(2)

number of eigenvalues in each window; 0 < para(2) < para(3) must hold;

para(3)

window size; para(2) < para(3) < mb_t must hold;

para(4)

minimal percentage of FLOPS required for performing matrix-matrix multiplications instead of pipelined orthogonal transformations; 0 para(4) 100 must hold;

para(5)

width of block column slabs for row-wise application of pipelined orthogonal transformations in their factorized form; 0 < para(5)mb_t must hold.

para(6)

the maximum number of eigenvalues moved together over a process border; in practice, this will be approximately half of the cross border window size; 0 < para(6)para(2) must hold.

n

(global) INTEGER

The order of the globally distributed matrix t. n 0.

t

REAL for pstrord

DOUBLE PRECISION for pdtrord

(local) array of size (lld_t,LOCc(n)).

The local pieces of the global distributed upper quasi-triangular matrix T, in Schur form.

it, jt

(global) INTEGER

The row and column index in the global matrix T indicating the first column of T. it = jt = 1 must hold (see Application Notes).

desct

(global and local) INTEGER array of size dlen_.

The array descriptor for the global distributed matrix T.
q

REAL for pstrord

DOUBLE PRECISION for pdtrord

(local) array of size (lld_q,LOCc(n)).

On entry, if compq = 'V', the local pieces of the global distributed matrix Q of Schur vectors.

If compq = 'N', q is not referenced.

iq, jq

(global) INTEGER

The column index in the global matrix Q indicating the first column of Q. iq = jq = 1 must hold (see Application Notes).

descq

(global and local) INTEGER array of size dlen_.

The array descriptor for the global distributed matrix Q.

work

REAL for pstrord

DOUBLE PRECISION for pdtrord

(local workspace) array of size lwork

lwork

(local) INTEGER

The size of the array work.

If lwork = -1, then a workspace query is assumed; the routine only calculates the optimal size of the work array, returns this value as the first entry of the work array, and no error message related to lwork is issued by pxerbla.

iwork

(local workspace) INTEGER array of size liwork

liwork

(local) INTEGER

The size of the array iwork.

If liwork = -1, then a workspace query is assumed; the routine only calculates the optimal size of the iwork array, returns this value as the first entry of the iwork array, and no error message related to liwork is issued by pxerbla

OUTPUT Parameters

select

(global) INTEGER array of size n

The (partial) reordering is displayed.

t

On exit, t is overwritten by the local pieces of the reordered matrix T, again in Schur form, with the selected eigenvalues in the globally leading diagonal blocks.

q

On exit, if compq = 'V', q has been postmultiplied by the global orthogonal transformation matrix which reorders t; the leading m columns of q form an orthonormal basis for the specified invariant subspace.

If compq = 'N', q is not referenced.

wr, wi

REAL for pstrord

DOUBLE PRECISION for pdtrord

(global ) array of size n

The real and imaginary parts, respectively, of the reordered eigenvalues of the matrix T. The eigenvalues are in principle stored in the same order as on the diagonal of T, with wr(i) = t(i,i) and, if t(i:i+1,i:i+1) is a 2-by-2 diagonal block, wi(i) > 0 and wi(i+1) = -wi(i).

Note also that if a complex eigenvalue is sufficiently ill-conditioned, then its value may differ significantly from its value before reordering.

m

(global ) INTEGER

The size of the specified invariant subspace.

0 mn.

work(1)

On exit, if info = 0, work(1) returns the optimal lwork.

iwork(1)

On exit, if info = 0, iwork(1) returns the optimal liwork.

info

(global) INTEGER

= 0: successful exit

< 0: if info = -i, the i-th argument had an illegal value. If the i-th argument is an array and the j-th entry had an illegal value, then info = -(i*1000+j), if the i-th argument is a scalar and had an illegal value, then info = -i.

> 0: here we have several possibilities

  • Reordering of t failed because some eigenvalues are too close to separate (the problem is very ill-conditioned);

    t may have been partially reordered, and wr and wi contain the eigenvalues in the same order as in t.

    On exit, info = {the index of t where the swap failed}.

  • A 2-by-2 block to be reordered split into two 1-by-1 blocks and the second block failed to swap with an adjacent block.

    On exit, info = {the index of t where the swap failed}.

  • If info = n+1, there is no valid BLACS context (see the BLACS documentation for details).

Application Notes

The following alignment requirements must hold:

  • mb_t = nb_t = mb_q = nb_q

  • rsrc_t = rsrc_q

  • csrc_t = csrc_q

All matrices must be blocked by a block factor larger than or equal to two (3). This is to simplify reordering across processor borders in the presence of 2-by-2 blocks.

This algorithm cannot work on submatrices of t and q, i.e., it = jt = iq = jq = 1 must hold. This is however no limitation since p?lahqr does not compute Schur forms of submatrices anyway.

Parallel execution recommendations:

  • Use a square grid, if possible, for maximum performance. The block parameters in para should be kept well below the data distribution block size.

  • In general, the parallel algorithm strives to perform as much work as possible without crossing the block borders on the main block diagonal.

See Also