Recursive Algorithm 2.1. Cholesky recursive algorithm if lower trian-
gular part of A is given (rcholesky):
Do recursion
• if n > 1 then
• L
11
:= rcholesky of A
11
• L
21
T
L
11
= A
21
→ RTRSM
•
22
Aˆ
:= A
22
– L
21
T
L
21
→ RSYRK
176
• L
22
:= rcholesky of
22
Aˆ
• otherwise
• L := A
End recursion
The matrices A
11
, A
12
, A
21
, A
22
, L
11
,
L
21
, L
22
, U
11
, U
12
, and U
22
, are sub-
matrices of A, L and U respectively.
.
,
,
=
=
=
22
12
11
22
21
11
22
21
12
11
U
U
U
U
L
L
L
L
A
A
A
A
A
Recursive Algorithm 2.2. Cholesky recursive algorithm if upper trian-
gular part of A is given (rcholesky):
Do recursion
• if n > 1 then
• U
11
:= rcholesky of A
11
•
T
U
11
U
12
= A
12
→ RTRSM
•
22
Aˆ
:= A
22
–
T
U
11
U
12
→ RSYRK
• U
22
:= rcholesky of
22
Aˆ
• otherwise
• U := A
End recursion
The sizes of the subrnatrices are: for A
11
,
L
11
and U
11
is h × h, for A
21
and L
21
is (n – h) × h. for A
12
and U
12
is h × (n – h), and for A
22
, L
22
and U
22
is (n – h) × (n – h), where h = n/2. Matrices L
11
, L
22
, U
11
and U
22
are lower
and upper triangular respectively. Matrices A
11
, A
12
, A
21
, A
22
, L
21
and U
12
are
rectangular.
The RTRSM and RSYRK are recursive BLAS of _TRSM and _SYRK
respectively. _TRSM solves a triangular system of equations. _SYRK per-
forms the symmetric rank k operations (see Sections 4.1 and 4.2).
The listing of the Recursive Cholesky Factorization Subroutine is at-
tached in the Appendix A.
2.1. Full and Packed Storage Data Format
The Cholesky factorization algorithm can be programmed either in "full
storage" or "packed storage". For example the LAPACK subroutine POTRF
177
works on full storage, while the routine PPTRF is programmed for packed
storage. Here we are interested only in full storage and packed storage hold-
ing data that represents dense symmetric positive definite matrices. We will
compare our recursive algorithms to the LAPACK POTRF and PPTRF sub-
routines.
The POTRF subroutine uses the Cholesky algorithm in full storage. A
storage for the full array A must be declared even if only n × (n + l)/2 ele-
ments of array A are needed and, hence n × (n – l)/2 elements are not
touched. The PPTRF subroutine uses the Cholesky algorithm on packed
storage. It only needs n × (n + l)/2 memory words. Moreover the POTRF
subroutine works fast while the PPTRF subroutine works slow. Why? The
routine POTRF is constructed with BLAS level 3, while the PPTRF uses the
BLAS level 2. These LAPACK data structures are illustrated by the figs. 1
and 2 respectively.
Figure 1. The mapping of 7×7 matrix for the LAPACK Cholesky Algorithm
using the full storage
Figure 2. The mapping of 7×7 matrix for the LAPACK Cholesky Algorithm
using the packed storage
178
2.2 The Recursive Storage Data Format
We introduce a new storage data format, the recursive storage data for-
mat, using the recursive algorithms 2.1 and 2.2. Like packed data format
this recursive storage data format requires n × (n – l)/2 storage for the upper
or lower part of the matrix. The recursive storage data format is illustrated
in fig. 3. A buffer of the size p × (p – l)/2, where p = [n/2] (integer division),
is needed to convert from the LAPACK packed storage data, format to the
recursive packed storage data, format, and back. No buffer is needed if data
is given in recursive format.
Figure 3. The mapping of 7×7 matrix for the LAPACK Cholesky Algorithm
using the packed recursive storage
We can apply the BLAS level 3 using the recursive packed storage data
format. The performance of the recursive Cholesky algorithm with the re-
cursive packed data format reaches the performance of the LAPACK
POTRF algorithm. A graphs with the performance between different stor-
ages is presented in fig. 4.
The Figure 4 has two subfigures. The upper subfigure shows compari-
son curves for Cholesky factorization. The lower subfigure show compari-
son curves of forward and backward substitutions. The captions describe
179
details of the performance figures. Each subfigure presents six curves. From
the bottom: The first two curves represent LAPACK PPTRF performance
results for upper and lower case respectively. The third and fourth curves
give the performance of the recursive algorithms, L and U, respectively. The
conversion time from LAPACK packed data format to the recursive packed
data format and back is included here. The fifth and sixth curves give per-
formance of the L and U variants for the LAPACK POTRF algorithm. The
IBM SMP optimized ESSL DGEMM was used by the last four algorithms.
The same good results were obtained on other parallel supercomputers,
for example on Compaq a DS-20 and SGI Origin 2000.
The paper [1] gives comparison performance results on various com-
puters for the Cholesky factorization and solution.
Достарыңызбен бөлісу: |