date_wts, update_parameters and update_approximations. The time spent
in the base_cycle function and it resulted about the 99.5% of the total time.
Therefore, this function has been identified as that one where parallelism
must be exploited to speed up the AutoClass performance. In particular, if
we analyze the time spent in each of the three functions called by base- cycle, it appears that the update_wts and update_parameters functions are
the most time consuming functions whereas the time spent in the up- date_approximation is negligible. Therefore, P-AutoClass is based on the
parallelization of these two functions using the SPMD approach. To main-
tain the same semantics of the sequential algorithm of AutoClass, the paral-
lel version is based on partitioning data and local computation on each of P processors of a distributed memory MIMD computer and on exchanging
among the processors all the local variables that contribute to form the
global values of a classification.
P-Autoclass has been implemented using MPI on parallel computers
and clusters. Experimental results shown that P-AutoClass is scalable both
in terms of speedup and scaleup. This means that for a given data set, the
execution times can be reduced as the number of processors increases, and
the execution times do not increase if, while increasing the size of data set,
more processors are available. The out-of-core technique is more effective
when large data blocks are used to transfer data from external to main
memory. Finally, the P-AutoClass algorithm is portable to various MIMD
distributed-memory parallel computers that are now currently available
from a large number of vendors. It allows to perform efficient clustering on
very large data sets significantly reducing the computation times on several
parallel computing platforms.
Other research activities in the area of high-performance data mining
are performed at ISI-CNR in the design and implementation of a grid-based
distributed architecture for knowledge discovery on computational grids,
called Knowledge Grid, discussed in the next section.