of the Thesis «Development of a multilingual thesaurus for the information
technologies including the morphology of the Kazakh language with the
purpose of the information system support of scientific and education activities»
represented by
MADINA SAMBETBAYEVA ARALBAYEVNA
in candidacy for a degree of doctor (PhD) in specialty: 6D070300 – Information
Systems
Topicality of the research. Nowadays though the majority of the Information
resources is transferred to the digital form, it is inaccessible for a wide circle of the
scientific community, while the resources represented on the Internet are segmental
and inconsistent. That’s why the process of the academic activities often requires
systematization and classification of the available information resources.
Development of the information technologies in general as well as the
technologies in the sphere of communication and information processing in particular,
led to the emergence of brand new opportunities of organization of almost the entire
academic circle, which in its turn caused the qualitative growth of the participants’
information needs. The modern students equipped with a computer and using the
opportunities of the Internet on a daily basis cannot be satisfied with a traditional
educational process and usual formats of the learning materials, such as textbooks,
books or flat text files. The guiding materials can currently be represented in many
digital formats; they should be supported with many search and classification services.
In the process of academic activities the work references, various materials and
documents are rather time- and energy-consumable, such as: search of necessary
documents, their systematization and classification in accordance with the set task. To
satisfy the modern users’ information needs the support of the narrow functions of the
information search and classification is required, as well as the browse of the resources
according to the categories (headings) and the classification dictionaries. The most
important task is systematization of the resources (thematic classification), for solution
of which it is necessary to clearly define content of the logical and semantic categories
(facets) and the key terms (concepts) covering the chosen rather narrow subject area of
a user’s concern.
There are currently many powerful information systems to some extent focused
on the support of scientific researches. Among them there are information systems
close to factographic, e.g. Integrated System of Information Resources of the Russian
Academy of Science, Integrated Distributed Information System of the Russian
Academy of Science, euroCRIS, and documentary, e.g. eLibrary, Informika,
MathNET. The name of system to some extent satisfies information needs of the
academic society, though each of them is not functionally perfect.
The main disadvantages of the majority of the systems are limited opportunities
of conducting analytical work with resources both inside of each system and in the
external systems (the international standards and recommendations are often not
considered, low interoperability). It is extremely inconvenient in the sphere of
academic activities and one of the paramount tasks is in establishing connections
between certain scientific facts (e.g, ‘what term ‘cybernetics’ means’ or ‘who the
author of the present article is’) and the subject of the information system (persons,
facts, documents, publications etc.).
Standard approach to systematization of information is classification of the
documents by means of taxonomies. Taxonomy is a subject classification grouping the
term as a controllable dictionary (thesaurus) and sorting them (dictionaries) out as
hierarchical structures. For description of a subject area a certain set of key terms is
usually applied, each of which denotes or describes some concept of this subject area.
The base of classification consists of the highlighting the concepts (key terms),
establishing the paradigmatic relations (e.g. parent – child) between them and
correlation of the analyzed document with the highlighted concepts.
The most unpleasant thing in provision of information academic systems is that
the technologies of information classification and systematization developed by
libraries during the last hundred years do not work in narrow subject areas due to
thematic proximity of the classified documents. For example, dictionaries UDC
10
and
MSC2000
11
, or thesaurus UNESCO
12
, the most convenient for classification of
resources in mathematics and informatics, usually refer all the resources selected for a
certain training course, to the same category.
Development of the specialized thesauruses is actual by itself for both the
development and systematization of conceptual construct of the subject area (of
informatics, in this case), and logical search of information in the full-text databases,
on the Internet, as the means for formation of search need, formulation of search
prescriptions and adequate automatic indexing, systematization and classification of
documents.
The main problem is high labor intensity and cost of manual thesaurus
composition, as well as low flexibility of the process of its construction. In the
thesauruses for manual indexing the combinations of close concepts are limited to one,
the most respectable concept for reducing the indexing subjectivity. In the automatized
thesauruses semantically close concepts are represented as individual units, which
allow using synonymic rows when searching. The complexity in constructing a
thesaurus corresponding to all the thematic diversity of the indexed information is one
of the reasons of its unpopularity in the modern information systems. But if to consider
the issue of efficiency of the information systems in certain knowledge areas, then
creation and use of the specialized thesauruses in such systems is undoubtedly
interesting and transfers the system into the entirely different quality class.
Additional peculiarities at creation of an information system of academic
activities support for such countries as Kazakhstan and Russia is the necessity of
supporting search and classification processes simultaneously in several languages: for
Russia it is mainly in two languages (Russian and English), while for Kazakhstan at
least in three (Russian, English and Kazakh) languages. Thus, the documents should
10
Universal Decimal Classification (UDC), supported by the International Federation on Documentation (Federation Internationale de
Documentation - FID) and UDC Consortium (http://www.udcc.org/), Russian version of the UDC is supported by the All-Union
Institute of Scientific and Technical Information of the Russian Academy of Science.
11
Mathematics Subject Classification (http://www.ams.org/msc/) is supported by the American Mathematic Society (AMS).
12
http://databases.unesco.org/thesru/
be indexed in three different spaces, corresponding to three languages between the
elements of which the relations of equivalence should be established, or in the
integrated space set by the multilingual thesaurus.
We should note that the elements of the attribute spaces may be represented in
document in various forms of words, that’s why the main problem is the consideration
of morphology of a certain language at indexation of documents.
The attempts to construct the systems of document classification simultaneously
in Russian, English and Kazakh are unknown to the author.
The above said gives the ground to state that creation of an information system of
academic activities support equipped with narrow search mechanism and adaptive
services designed to satisfy information needs of researchers, using multilingual
thesaurus on information technologies is a rather actual task significantly contributing
to the development of this scientific area.
The set tasks are nationally and internationally significant, because their solution
theoretically and practically contributes to the development of specialized
lexicographic resources for the Turkic languages.
The efficiency of the systems for academic activities support directly depends on
the application of specialized thesauruses, that’s why the present scientific work is
actual.
Достарыңызбен бөлісу: |