Р. Г. Стронгина. Ниж- ний Новгород: Изд-во Нижегородского университета, 2002, 217 с



Pdf көрінісі
бет139/151
Дата26.01.2022
өлшемі1,64 Mb.
#24342
түріСеминар
1   ...   135   136   137   138   139   140   141   142   ...   151
date_wts, update_parameters and  update_approximations.  The time spent 
in the base_cycle function and it resulted about the 99.5% of the total time. 
Therefore, this function has been identified as that one where parallelism 
must be exploited to speed up the AutoClass performance. In particular, if 
we analyze the time spent in each of the three functions called by base-
cycle, it appears that the update_wts and update_parameters functions are 
the most time consuming functions whereas the time spent in the up-
date_approximation  is negligible. Therefore, P-AutoClass is based on the 
parallelization of these two functions using the SPMD approach. To main-
tain the same semantics of the sequential algorithm of AutoClass, the paral-
lel version is based on partitioning data and local computation on each of 
processors of a distributed memory MIMD computer and on exchanging 
among the processors all the local variables that contribute to form the 
global values of a classification. 
P-Autoclass has been implemented using MPI on parallel computers 
and clusters. Experimental results shown that P-AutoClass is scalable both 
in terms of speedup and scaleup. This means that for a given data set, the 
execution times can be reduced as the number of processors increases, and 
the execution times do not increase if, while increasing the size of data set, 
more processors are available. The out-of-core technique is more effective 
when large data blocks are used to transfer data from external to main 
memory. Finally, the P-AutoClass algorithm is portable to various MIMD 
distributed-memory parallel computers that are now currently available 
from a large number of vendors. It allows to perform efficient clustering on 
very large data sets significantly reducing the computation times on several 
parallel computing platforms. 
Other research activities in the area of high-performance data mining 
are performed at ISI-CNR in the design and implementation of a grid-based 
distributed architecture for knowledge discovery on computational grids, 
called Knowledge Grid, discussed in the next section. 


Достарыңызбен бөлісу:
1   ...   135   136   137   138   139   140   141   142   ...   151




©emirsaba.org 2024
әкімшілігінің қараңыз

    Басты бет