Q > n, are chosen according to the rule:
Step 1. Points of the set
X
k
= {x
l
,...,x
k
}
∪{a}∪{b} (7)
190
including the boundaries of the interval [ a, b] and coordinates x
j
, 1
≤ j ≤ k,
of previous trials, where
k = k( Q–1) = p(l) + . . . + p( Q–1)
are renumbered (by subscripts) in the order of increasing the coordinates
a = x
0
< x
i
< . . . < x
τ
= b
where
τ+1 = τ( Q)+1 is quantity of different elements of the set X
k
from (7);
Step 2. A real number R( i) is assigned to each subinterval ( x
i–1
, x
i
),
1
≤ i ≤ τ,
where R( i) is called the characteristic of this subinterval;
Step 3. Characteristics R( i) of the subintervals ( x
i–1
, x
i
), 1
≤ i ≤ τ, are
ordered by decreasing
R( i
1
)
≥ R( i
2
)
≥ … ≥ R( i
τ
); (8)
Step 4. The next p trials of the Q-th iteration are performed in parallel
on p processors at the points of the set
T( Q) = { x
k+1
, … , x
k+ p
}
where
x
k+ q
= S( i
q
)
and i
q
, 1
≤ q ≤ p, are the first p indices from the series (8) and the function S
is such that
)
,
(
q
i
q
i
q
k
x
x
x
∈
+
.
In this case it is supposed that
p = p( Q)
≤ τ, Q > n.
Many sequential methods proposed for solving the one-dimensional
problems (4)–(5) and (4)–(6) can be generalized to the parallel case by us-
ing the scheme introduced above
Let us consider a parallel characteristical algorithm (PA) and its
sequential analog (SA) obtained by setting p = 1. Efficiency of PA in com-
parison with SA strongly depends on the number p > 1 of parallel proces-
sors used for parallelization. By increasing p we try to obtain speed up in p
times. It is possible only if the number of trials executed by PA is equal to
the number of trials done by SA. But these numbers can be different be-
cause of the following reason.
In SA the decision about the choice of a new trial point x
k+1
is based on
the complete information
191
w
k
= {x
l
,…, x
k
; z
l
,…, z
k
; z′
l
,…, z′
k
}
available after executing a trial at the point x
k
, where z
i
=
φ(x
i
), z′
i
=
φ′(x
i
),
1
≤ i ≤ k. At the same time in PA p points are chosen on the base of the
same information w
k
and the data
{x
k+l
,…, x
k+j–1
; z
k+l
,…, z
k+j–1
; z′
k+l
,…, z′
k+j–1
}
are absent at the moment of the choice of x
k+j
, 2
≤ j ≤ p. Thus, the choice of
the points x
k+2
,…, x
k+p
by PA is not optimal (from the informative view-
point) in comparison with SA. This fact can cause appearance of redundant
(with respect to the sequential search) points in the sequence {y
u
} of the
trial points produced by PA and to slow down the search. To describe the
quantity of such points we introduce the redundancy coefficient
K(u, n) = K′ (u, n)/(u – n),
where u > n and
K′(u, n) = card({y
n+1
,…, y
n
}\{x
k
}).
Here, {x
k
} is the sequence of trial points produced by SA. K′(u, n) is the
number of redundant points from {y
u
} starting from the (n + l)-th to the u-th
trial. This definition requires fulfillment of the inclusion {x
k
}
⊂ {y
u
}. The
case {x
k
} = {y
u
} is the best situation with K(u, n) = 0: the redundant points
have not been produced. This fact implies speed up equal to the number of
processors used for parallelization.
On the basis of some additional information about structure of the ob-
jective function it is possible to obtain theoretical estimates of K(u, n) and to
establish cases where K(u, n) is bounded ensuring high levels of speed up.
Wide numerical experiments executed on the parallel computers illustrate
performance of the parallel characteristical methods and confirm theoretical
results.
192
HIGH PERFORMANCE COMPUTING : TOOLS AND APPLICATIONS
G. Spezzano and D. Talia
ISI-CNR, Via P. Bucci cubo 41C, 87036 Rende, Italy
Abstract
Computers are more and more fast and powerful, however they are used
to solve larger and more challenging problems, thus available computing
power is never sufficient. For this reason, high performance computing ar-
chitectures have been designed in the past twenty years from vector super-
computers to massively parallel parallel computers and cluster computing
systems. Both in scientific and commercial computing, high performance
computers are required to manage complex tasks and store and analyze the
flood of data and information today available. This paper discusses research
activities in the area of high performance computing at ISI-CNR. In particu-
lar, tools and applications in the areas of parallel cellular automata, parallel
data mining, evolutionary algorithms, and grid computing are introduced
and discussed.
1. Introduction
Computers are ever more fast and powerful, but their computing power
is never considered sufficient because when a new machine faster than the
previous ones is developed it is used to solve larger and more challenging
problems. For this reason, high performance computing architectures have
been designed in the past twenty years starting from vector supercomputers
to massively parallel parallel computers and cluster computing systems.
Both in scientific and commercial computing, high performance computers
are effectively used to manage complex tasks and store and analyze the
flood of data and information today available.
The demand for computing both in terms of volume of information and
the variety of processing is ever increasing. Many previously infeasible sys-
tems are expected naturally today. Basic physical limitations on electronic
circuits have started to force high performance computations to be targeted
toward the exploitation of parallelism. Just as the fastest cycle times are
approaching the fundamental barriers imposed by physical device limita-
tions, different kinds of parallel and distributed systems are emerging. It's
193
only a matter of time before this trend will affect general purpose comput-
ing. A step in that direction can be represented by cluster computers built
using off-the-shelf components based on PC technology.
Parallel computing systems more and more demonstrate their effective-
ness as tools that deliver high-performance in solving complex problems.
However, often, parallel application developers are concerned with low-
level issues, such as synchronization, process mapping, and architecture
details, when they use parallel computers. These issues mainly arise from
the different architectural features that characterize parallel computers.
These architectures are a class of systems including many different architec-
tures, from single-instruction-multiple-data (SIMD) machines to distrib-
uted-memory, multiple-instruction-multiple-data (MIMD) computers and
workstation clusters. This architectural heterogeneity complicates the task
of designers of parallel applications and leads to the development of parallel
software that lacks portability. Parallelism, either at the architecture or algo-
rithm level, is a significant departure from traditional sequential computa-
tion. The sequential approach attempts to reduce processing time by carry-
ing out each operation faster and by eliminating redundancies in computa-
tion. In parallel computing, we save time by identifying concurrently execu-
table subtasks in a given task and executing them on multiple processors
simultaneously. The first step in the development of a parallel program is
the identification of sections that can be executed in parallel and how these
should be parallelized. But, unlike the von Neumann model for sequential
computing, there is no general purpose model for parallel computation thus
different approaches must be followed depending on the parallel model and
target architecture.
Languages, tools, and environments for parallel computing are very
critical for the development of high-performance applications on parallel
machines. They allows users to exploit the computational power of scalable
computer systems in solving very demanding applications that have large
computation and memory requirements. Among several critical research
areas in high-performance computing, we discuss here some of those that
are today very interesting and promising and in a near future will play a
major role in scientific and commercial applications of parallel computers.
These areas concern with
high-performance simulation of complex systems based on cellular
automata, knowledge discovery on large data sets using parallel data mining
194
algorithms, parallel genetic programming and grid computing. These are
some research topics that are investigated at ISI-CNR in the area of high
performance computing.
2. Cellular Automata and Computational Simulation
Cellular Automata (CA) are discrete dynamical systems and, at the
same time, are parallel computing abstract models. For this reason CA can
be efficiently implemented on parallel computers. When CA are executed
on a parallel computer, they allow a user to exploit their inherent, parallel-
ism to support the efficient simulation of complex systems that, can be
modeled by a very large number of simple elements (cells) with local inter-
action only. In the last decade, several years after their definition, with rapid
advances in development of high performance architectures, CA have in-
creasingly utilized for more general computer simulation and modeling. The
CA model can be effectively used both as a realistic approach to define ab-
stract parallel machines and as a programming methodology for computa-
tional science on parallel computers.
From the marriage of cellular automata and MIMD parallel computing
was born CAMELot (Cellular Automata environment for systems modeling
open technology), a software environment based on the cellular automata
model which has been originally implemented on a transputer-based multi-
computer and now is available on a large set of parallel architectures (from
massively parallel machine to PC clusters). CAMELot is a tool designed to
support the development of high-performance applications in science and
engineering. It offers the computing power of a parallel computer although
hiding the architecture issues to a user. It provides an interface which allows
a user to dynamically change the parameters of the application during its
running, and a graphical interface which shows the application output.
CAMELot has been successfully used for solving many problems such as
the simulation of the lava flows and other fluid flow models, for image
processing, freeway traffic simulation, and soil bioremediation. The results
achieved in terms of thoroughness of simulation and execution speedup
show that CAMELot might be used for simulation and modeling of real
systems in many areas in a simple way and achieving high performance.
The system provides a high-level programming language, called
CARPET, for programming cellular algorithms. The main features of
CARPET are the possibility to describe the state of a cell as a set of typed
195
substates and the definition of complex neighborhoods, that can be also time
dependent, in a discrete cartesian space. The portable version of CAMELot
has been developed in the COLOMBO (Parallel COmputers improve cLean
up of sOils by Modelling BiOremediation) project within the ESPRIT
framework [26].
The main goal of CAMELot is to integrate computation, visualization
and control into a single environment that allows interactive steering of sci-
entific applications. CAMEL consists of :
• a parallel run-time support for the CARPET language;
• a graphical user interface (GUI) for editing, compiling, configuring,
executing and steering the computation ;
• a tool for the interactive visualization of the results.
The run-time support is implemented as a SPMD (Single Program,
Multiple Data) program. The latest implementation is based on the C lan-
guage plus the standard MPI library and can be executed on different paral-
lel machines such as the Meiko CS-2
:
the CRAY T3E and a cluster of work-
stations. The parallel program that implements the architecture of the sys-
tem is composed by set. of macrocell processes, a controller process and a
GUI process. Each macrocell process, that operates on a strip of cells of the
CA, runs on a single processing element of the parallel machine and exe-
cutes the updating of the state of cells belonging to its partition. The syn-
chronization of the automaton and the execution of the commands, provided
by a user through the GUI interface, are carried out by the controller proc-
ess. MPI primitives handle all the communications among the processes
using MPI communicators.
To improve the performance of applications that have a diffusive be-
havior such as CFD, CAMELot implements a load balancing strategy for
mapping lattice partitions on the processing elements [27]. This load bal-
ancing strategy is a domain decomposition technique similar to the scattered
decomposition technique. By the CAMELot GUI, a user can assign colors
to cell substates values and visualize on the fly, on different windows, the
substates values, thus she/he can follow the evolution of the simulation and
steer the computation. CAMELot provides the development environment
and the runtime system for the parallel execution of CARPET programs.
CARPET (CellulAR Programming EnvironmenT) is a high-level lan-
guage related to C with some additional constructs to describe the rules of
196
the transition function of a single cell of a cellular automaton [28].
CARPET supports researchers in the development of models of discrete
dynamic systems by a set of constructs that allow to translate a model ex-
pressed in the CA formalism into a cellular algorithm. A CARPET pro-
grammer can define both standard CA and generalized CA. It describes the
rules of the state transition function of a single cell of a cellular automaton.
CARPET defines the structure of CA as a lattice in a discrete Cartesian
space in which the maximum number of dimensions allowed is equal to 3.
Each dimension is wrapped around, e.g. a two-dimensional lattice forms a
torus. Border functions can be used to disable the wrap-around. The size of
the lattice is defined by the graphical user interface (GUI) of the CAMELot
system and is only bounded by the amount of available memory. The radius
of the neighborhood of a cell is statically defined and is strictly connected to
the dimension of an automaton. It allows a user to define the maximum
numbers of cells which can compose the neighborhood of a cell.
To improve the readability of a program and extend the range of the ap-
plications that can be coded as cellular algorithms, the state of a cell in
CARPET is structured as a record in which the C basic types : char, shorts,
integers, floats, doubles and mono-dimensional arrays of these types can be
used to define the substates that represent the physical quantities of a sys-
tem. Furthermore, the substates are also used to store values that specify
statistical properties of a variable or to hold a history of a substate. The pre-
defined variable cell refers to the current cell in the d-dimensional space
under consideration. A substate can be referred appending to the reserved
word cell the substate's name by the underscore symbol '_' (i.e, cell-
substate) . CARPET generalizes the concept of neighborhood and allows a
user to define a logic neighborhood that describes both regular neighbor-
hood (i.e., von Neumann, Moore) and asymmetrical neighborhoods with
different topological properties (i.e., hexagonal) that can be time-dependent.
A neighborhood is defined associating it the name of a vector whose ele-
ments compose the logic neighborhood and defining a name for each neigh-
boring cell. Both the name of the vector and names of neighbor cells can be
used to refer to the value of a cell substate (i.e., identifier[i]_substate,
North_substate). Finally, global parameters can be defined in CARPET to
model global characteristics (e.g., porosity of a soil, number of Reynolds) of
a complex system. As an example, figure 1 shows the declaration part of a
CARPET program that defines a two-dimensional automaton. The radius is
197
equal to 1, the state is composed of two integer substates and one float sub-
state. The Moore neighborhood and two global parameters are defined.
These declarations are contained into a cadef section at the beginning of a
CARPET program.
cadef
{
dimension 2;
radius 1 ;
state ( int water, humidity ; float pollution) ;
neighbor Moore [8] ( [0,–1] North, [1,–1] NorthEast, [1,0] East,
[1,1] SouthEast, [0,1] South, [–1,1] SouthWest ,
[–1,0] West, [–1,–1] NorthWest);
parameter (porosity 0.003, permeability 0.4);
}
Fig. 1. The CA declaration part
The new value of the substate does take effect from the next iteration.
To allow a user to define spatially inhomogenous CA, CARPET defines the
GetX, GetY, and GetZ operations that return the value of the X, Y and Z
coordinate of a cell in the automaton. Furthermore, by means of the prede-
fined variable step, that contains the number of iterations that have been
executed, the language supports the implementation of neighborhoods and
transition functions that are time-dependent. The example in figure 2 de-
scribes how the CARPET constructs are used to implement the classical life
program in a simple and very expressive way. In particular, the update
statement assigns a new value to the cell state.
CAMELot has been used for the implementation of a large number of
applications in different areas. In the scientific and engineering area have
been developed simulations of lava flow [29], freeway traffic [30], land-
slides [31], soil bioremediation [32], gas diffusion, and ising. Applications
of artificial life developed by CAMELot are ant colony, flocking and
Greenberg-Hastings model [33]. Finally, CAMELot has been used to solve
artificial intelligence problems by implementing cellular genetic algorithms.
In the [34] is given an outline of a few of these applications.
198
#define alive 1
#define dead 0
cadef {
dimension 2;
/* bidimensional lattice */
radius 1;
state (short life);
neighbor Moore[8]([0,–1] North, [–1,–1] NorthWest, [–1,0] West,
[–1,1]Southwest, [0,1]South, [1,1] SouthEast,
[1,0] East, [1,–1] NorthEast );
}
int i; short sum;
{
/* computing the number of alive neighbors of cell (x,y) */
for (i = 0; i < 8; i++)
sum = Moore_life[i] + sum;
/* computing the new state of cell (x,y) */
if ((sum == 2 && cell_life ==1) || sum == 3)
update (cell_life,alive);
else
update (cell_life,dead);
}
Fig. 2. The game of life algorithm in CARPET.
3. Parallel Genetic Programming
Genetic programming (GP) is a powerful tool to solve problems coming
from different application domains such as hard function and combinatorial
optimization, neural nets evolution, routing, planning and scheduling, man-
agement and economics, machine learning, data mining, robotics and pat-
tern recognition.
GP finds its origin and inspiration from genetic algorithms (GAs) [16]
are a class of adaptive general-purpose search techniques inspired by natural
evolution. A standard GA works on a population of elements (chromo-
somes), generally encoded as bit strings, representing a randomly chosen
sample of candidate solutions to a given problem. A GA is an iterative pro-
cedure that maintains a constant-size population of individuals. Each mem-
ber of the population is evaluated with respect to a fitness function. At each
199
step a new population is generated by selecting the members that have the
best fitness. In order to explore all the search space, the algorithm alters the
selected elements by means of genetic operators to form new elements to be
evaluated. The most common genetic operators are crossover and mutation.
Crossover combines two strings to produce new strings with bits from both,
thereby producing new search points. Through crossover the search is bi-
ased towards promising regions of the search space. Mutation flips string
bits at random if a probability test is passed; this ensures that, theoretically,
every portion of the search space is explored.
GP is a variation of genetic algorithms in which the evolving individu-
als are themselves computer programs instead of fixed length strings from a
limited alphabet of symbols [19]. Programs are represented as trees with
ordered branches in which the internal nodes are functions and the leaves
are so-called terminals of the problem. Thus, the user of GP has to make an
intelligent choice of set of functions and terminals for the problem at hand.
GP conducts its search in the space of computer programs.
The GP approach evolves a population of trees by using the genetic op-
erators of reproduction, recombination and mutation. Each tree represents a
candidate solution to a given problem and it is associated with a fitness
value that reflects how good it is, with respect to the other solutions in the
population. The reproduction operator copies individual trees of the current
population into the next generation with a probability proportionate to their
fitness (this strategy is also called roulette wheel selection scheme). The
recombination operator generates two new individuals by crossing two trees
randomly chosen nodes and exchanging the subtrees. The two individuals
Достарыңызбен бөлісу: |