Р. Г. Стронгина. Ниж- ний Новгород: Изд-во Нижегородского университета, 2002, 217 с



Pdf көрінісі
бет129/151
Дата26.01.2022
өлшемі1,64 Mb.
#24342
түріСеминар
1   ...   125   126   127   128   129   130   131   132   ...   151
Recursive Algorithm 2.1Cholesky recursive algorithm if lower trian-
gular part of A is given (rcholesky): 
Do recursion 
• if n > 1 then 
• L
11
 := rcholesky of A
11
 
• L
21
 
T
L
11
 = A
21
 → RTRSM 

22
Aˆ
 := A
22
 – L
21
T
L
21
 → RSYRK 


176 
• L
22
 := rcholesky of 
22
Aˆ
 
• otherwise 
• L  :=  A   
End recursion 
The matrices A
11
, A
12
A
21
A
22
, L
11

L
21
L
22
, U
11
, U
12
, and U
22
, are sub-
matrices of Aand respectively. 
 
.
,
,






=






=






=
22
12
11
22
21
11
22
21
12
11
U
U
U
U
L
L
L
L
A
A
A
A
A
 
Recursive Algorithm 2.2Cholesky recursive algorithm if upper trian-
gular part of A is given (rcholesky): 
Do recursion 
• if n > 1 then 
• U
11
 := rcholesky of A
11
 
• 
T
U
11
 U
12
 = A
12
 → RTRSM 

22
Aˆ
 := A
22
 – 
T
U
11
 U
12
 → RSYRK 
• U
22
 := rcholesky of 
22
Aˆ
 
• otherwise 
• U  :=  A   
End recursion 
 
The sizes of the subrnatrices are: for A
11
,
 
L
11
 and U
11
 is h × h, for A
21
 
and L
21
 is (n – h) × h. for A
12
 and U
12
 is h × (n – h), and for A
22
L
22
 and U
22 
is (n – h) × (n – h), where h = n/2. Matrices L
11
, L
22
, U
11
 and U
22
 are lower 
and upper triangular respectively. Matrices A
11
A
12
A
21
A
22
L
21
 and U
12 
are 
rectangular. 
The RTRSM and RSYRK are recursive BLAS of _TRSM and _SYRK 
respectively. _TRSM solves a triangular system of equations. _SYRK per-
forms the symmetric rank k operations (see Sections 4.1 and 4.2). 
The listing of the Recursive Cholesky Factorization Subroutine is at-
tached in the Appendix A. 
 
2.1. Full and Packed Storage Data Format 
The Cholesky factorization algorithm can be programmed either in "full 
storage" or "packed storage". For example the LAPACK subroutine POTRF 


 
177 
works on full storage, while the routine PPTRF is programmed for packed 
storage. Here we are interested only in full storage and packed storage hold-
ing data that represents dense symmetric positive definite matrices. We will 
compare our recursive algorithms to the LAPACK POTRF and PPTRF sub-
routines. 
The POTRF subroutine uses the Cholesky algorithm in full storage. A 
storage for the full array must be declared even if only n × (+ l)/2 ele-
ments of array A are needed and, hence n × (n – l)/2 elements are not 
touched. The PPTRF subroutine uses the Cholesky algorithm on packed 
storage. It only needs n × (+ l)/2 memory words. Moreover the POTRF 
subroutine works fast while the PPTRF subroutine works slow. Why? The 
routine POTRF is constructed with BLAS level 3, while the PPTRF uses the 
BLAS level 2. These LAPACK data structures are illustrated by the figs. 1 
and 2 respectively. 
 
Figure 1. The mapping of 7×7 matrix for the LAPACK Cholesky Algorithm 
using the full storage 
 
 
Figure 2. The mapping of 7×7 matrix for the LAPACK Cholesky Algorithm 
using the packed storage 
 


178 
2.2 The Recursive Storage Data Format 
We introduce a new storage data format, the recursive storage data for-
mat, using the recursive algorithms 2.1 and 2.2. Like packed data format 
this recursive storage data format requires n × (n – l)/2 storage for the upper 
or lower part of the matrix. The recursive storage data format is illustrated 
in fig. 3. A buffer of the size p × (p – l)/2, where p = [n/2] (integer division), 
is needed to convert from the LAPACK packed storage data, format to the 
recursive packed storage data, format, and back. No buffer is needed if data 
is given in recursive format. 
 
Figure 3. The mapping of 7×7 matrix for the LAPACK Cholesky Algorithm 
using the packed recursive storage 
 
We can apply the BLAS level 3 using the recursive packed storage data 
format. The performance of the recursive Cholesky algorithm with the re-
cursive packed data format reaches the performance of the LAPACK 
POTRF algorithm. A graphs with the performance between different stor-
ages is presented in fig. 4. 
The Figure 4 has two subfigures. The upper subfigure shows compari-
son curves for Cholesky factorization. The lower subfigure show compari-
son curves of forward and backward substitutions. The captions describe 


 
179 
details of the performance figures. Each subfigure presents six curves. From 
the bottom: The first two curves represent LAPACK PPTRF performance 
results for upper and lower case respectively. The third and fourth curves 
give the performance of the recursive algorithms, and U, respectively. The 
conversion time from LAPACK packed data format to the recursive packed 
data format and back is included here. The fifth and sixth curves give per-
formance of the and variants for the LAPACK POTRF algorithm. The 
IBM SMP optimized ESSL DGEMM was used by the last four algorithms. 
The same good results were obtained on other parallel supercomputers, 
for example on Compaq DS-20 and SGI Origin 2000. 
The paper [1] gives comparison performance results on various com-
puters for the Cholesky factorization and solution. 


Достарыңызбен бөлісу:
1   ...   125   126   127   128   129   130   131   132   ...   151




©emirsaba.org 2024
әкімшілігінің қараңыз

    Басты бет