Р. Г. Стронгина. Ниж- ний Новгород: Изд-во Нижегородского университета, 2002, 217 с



Pdf көрінісі
бет140/151
Дата26.01.2022
өлшемі1,64 Mb.
#24342
түріСеминар
1   ...   136   137   138   139   140   141   142   143   ...   151
5. Grid Computing 
The term Grid  refers to an infrastructure that enables the integrated
collaborative use of high-end computers, networks, databases, and scientific 
instruments owned and managed by multiple organizations. Grid applica-
tions often involve large amounts of data and/or computing and often re-


208 
quire secure resource sharing across organizational boundaries, and are thus 
not easily handled by today's Internet and Web infrastructures [l1]. 
Grid computing has emerged as an important new field, distinguished 
from conventional distributed computing by its focus on large-scale re-
source sharing, innovative applications, and, in some cases, high-
performance orientation.The real and specific problem that underlies the 
Grid concept is coordinated resource sharing and problem solving in dy-
namic, multi-institutional virtual organizations. The sharing that we are 
concerned with is not primarily file exchange but rather direct access to 
computers, software, data, and other resources, as is required by a range of 
collaborative problem-solving and resource-brokering strategies emerging 
in industry, science, and engineering. This sharing is. necessarily, highly 
controlled, with resource providers and consumers defining clearly and 
carefully just what is shared, who is allowed to share, and the conditions 
under which sharing occurs. Tools that implement grid services are emerg-
ing and some of these, such as Globus and Legion, are used by many re-
search teams in several countries [12]. 
Grid computing concepts were first, explored in the 1995 I-WAY ex-
periment, in which high-speed networks were used to connect, for a short 
time, high-end resources at 17 sites across North America. Out of this activ-
ity grew a number of Grid research projects that developed the core tech-
nologies for "production" Grids in various communities and scientific disci-
plines. For example, the US National Science Foundation's National Tech-
nology Grid and NASA's Information Power Grid are both creating Grid 
infrastructures to serve university and NASA researchers, respectively. 
Across Europe and the United States, the closely related European Data 
Grid, Particle Physics Data Grid and Grid Physics Network (GriPhyN) pro-
jects plan to analyze data from frontier physics experiments. And outside 
the specialized world of physics, the Network for Earthquake Engineering 
Simulation Grid (NEESgrid) aims to connect US civil engineers with the 
experimental facilities, data archives and computer simulation systems used 
to engineer better buildings. 
In the grid computing area, at ISI-CNR we are working in two specific 
areas: grid programming and knowledge discovery on grids. In this section 
we describe the research achievements in the latter area, and in particular 
we discuss the Knowledge Grid architecture. 
The  Knowledge Grid architecture, designed by Cannataro and Talia 
[13], is built on top of a computational grid that provides dependable, con-


 
209 
sistent, and pervasive access to high-end computational resources. The pro-
posed architecture uses the basic grid services (i.e., the Globus services) and 
defines a set of additional layers to implement the services of distributed 
knowledge discovery process on world wide connected computers where 
each node can be a sequential or a parallel machine. The Knowledge Grid 
enables the collaboration of scientists that must mine data that are stored in 
different research centers as well as executive managers that must use a 
knowledge management system that operates on several data warehouses 
located in the different company establishments. 
The Knowledge Grid attempts to overcome the difficulties of wide area, 
multi-site operation by exploiting the underlying grid infrastructure that 
provides basic services such as communication, authentication, resource 
management, and information. To this end, the knowledge grid architecture 
is organized so that more specialized data mining tools are compatible with 
lower-level grid mechanisms and also with the Data Grid services. This 
approach benefits from "standard" grid services that are more and more 
utilized and offers an open Parallel and Distribute Knowledge Discovery 
(PDKD) architecture that can be configured on top of grid middleware in a 
simple way. 
 
 
 
Fig. 5. Layers and components of the Knowledge Grid architecture 


210 
The Knowledge Grid services (layers) are organized in two hierarchic 
levels: core K-grid layer and high level K-grid layer. The former refers to 
services directly implemented on the top of generic grid services, the latter 
are used to describe, develop and execute PDKD computations over the 
Knowledge Grid (see fig.5). 
The core K-grid layer supports the definition, composition and execu-
tion of a PDKD computation over the grid. Its main goals are the manage-
ment of all metadata describing characteristics of data sources, third party 
data mining tools, data management, and data visualization tools and algo-
rithms. Moreover, this layer has to coordinate the PDKD computation exe-
cution, attempting to match the application requirements and the available 
grid resources. This layer comprises the following basic services: 
•  Knowledge Directory Service (KDS) responsible for maintaining a 
description of all the data and tools used in the Knowledge Grid. 
•  Resource allocation and execution management (RAEM) services used 
to find a mapping between an execution plan and available resources, 
with the goal of satisfying requirements (computing power, storage, 
memory. database, network bandwidth and latency) and constraints. 
The high-level K-grid layer comprises the services used to compose, to 
validate, and to execute a PDKD computation. Moreover, the layer offers 
services to store and analyze the knowledge discovered by PDKD computa-
tions. Main services are: 
•  Data Access (DA) services that are responsible for the search, selection 
(Data search services), extraction, transformation and delivery (Data 
extraction services) of data to be mined. 
•  Tools and algorithms access (TAAS) services that are responsible for 
the search, selection, downloading of data mining tools and algorithms 
•  Execution plan management (EPM) that handles execution plans as an 
abstract description of a PDKD grid application. An execution plan is a 
graph describing the interaction and data flows between data sources, 
extraction tools, DM tools, visualization tools, and storing of knowl-
edge results in the Knowledge Base Repository. 
•  Results presentation service (RPS) that specifies how to generate, pre-
sent and visualize the PDKD results (rules, associations, models, classi-
fication, etc.). Moreover, it offers the API to store in different formats 
these results in the Knowledge Base Repository. 


 
211 
This Knowledge Grid represents a first step in the process of studying 
the unification of PDKD and computational grid technologies and defining 
an integrating architecture for distributed data mining and knowledge dis-
covery based on grid services. We hope that the definition of such an archi-
tecture will accelerate progress on very large-scale geographically distrib-
uted data mining by enabling the integration of currently disjoint ap-
proaches and revealing technology gaps that require further research and 
development. Currently a first prototype of the system built on top of 
Globus is available. In particular, Cannataro, Talia and Trunfio have im-
plemented the Knowledge Directory Service and the Knowledge Metadata 
Repository of the Core K-grid layer, and the Data Access Service of the 
High level K-grid layer [14]. 
The metadata describing relevant objects for PDKD computations, such 
as data sources and data mining software, are represented by XML docu-
ments into a local repository (KMR), and their availability is published by 
entries into the Directory Information Tree maintained by a LDAP server, 
which is provided by the Grid Information Service (GIS) of the Globus 
Toolkit. The main attributes of the LDAP entries specify the location of the 
repositories containing the XML metadata, whereas the XML documents 
maintain more specific information for the effective use of resources. The 
basic tools of the DA service have been implemented allowing to find, re-
trieve and select metadata about PDKD objects on the grid, on the basis of 
different search parameters and selection filters. Moreover, we are modeling 
the representation of execution plans as graphs, where nodes represents 
computational elements (data sources, software programs, results, etc.) and 
arcs represents basic operations (data movements, data filtering, program 
execution, etc.). We plan to consider different network parameters, such as 
topology, bandwidth and latency, for PDKD program execution optimiza-
tion. 


Достарыңызбен бөлісу:
1   ...   136   137   138   139   140   141   142   143   ...   151




©emirsaba.org 2024
әкімшілігінің қараңыз

    Басты бет