Литература:
1. Гранин, Даниил. Вечера с Петром Великим.
Сообщения и свидетельства господина М. –
СПб., 2002.
2. Караулов Ю.Н. Русский язык и языковая
личность. – М., 1987.
3. Скребнев Ю.М. Эпитет // Русский язык:
Энциклопедия. Изд. 2-е, перераб. и доп. / Гл.
ред. Ю.Н. Караулов. – М., 1998. – С. 640-641.
4. Словарь русского языка ХI-XVII вв. Вып. 1-
26. – М.: Наука, 1975 …. (прод. изд.). –
(СлРЯ ХI-XVII).
5. Тимофеев Л.И. Теория литературы. – М., 1971.
* * *
Бұл мақалада орыс тілінің диахронды дамуы тұр-
ғысынан эпитеттің функциональды ерекшеліктеріне дәс-
түрлі емес көзқарасы ұсынылады. Петр I -дің хаттары
негізінде эпитеттің аксиологиялық шкаласының қалып-
тасуы көрсетілген, және XVII ғасырдың аяғы мен XVIII
ғасырдың бірінші жарты жылдығында құнды сипатқа ие
бола алатын тілдің жеке тұлғасына ғана қатысты емес,
сонымен қатар орыс тілінің ұлттық тіл ретінде қа-
лыптасуының жалпы эволюциялық үрдісіне де қатысты
болады.
* * *
The article gives the nonconventional view on functional
features of epithet from positions of the diachronic
development of the Russian language. On the example of the
correspondence of Peter I, it shows the formation of the
epithet axiological scale representing a set of value
characteristics correlated not only with a particular language
personality of the end of the 17-th century - the first third of
the 18-th century, but also with the general evolutionary
processes in the Russian language at the initial stage of its
formation as the national language.
Overcoming ambiguity of verbs in english-azerbaijani mt system: the initial approximation
Ali Agababa Aliyev
Azerbaijan University of languages, Chair of General Linguistics, Baku, Azerbaijan
Аннотация. In this paper, the whole process of disambiguation of verbs in the English – Azerbaijani MT
system is introduced. The creation of database of formal features and algorithms, presenting their functioning
in the translation process, is illustrated. As initial approximation on the issue of disambiguation of verb
senses, the working mechanism is considered as a basic result to the solution of the problem in the English –
Azerbaijani MT system.
1. Ambiguity and its effect on the translation
quality
The case of linguistic unit having two or more
senses, identical in the shade of meaning, is
considered as polysemy in natural languages (e.g.
the verb bring can be translated as fetch smth, give
rise, deliver, profit, to raise a question, introduce,
convince and other occurrences and in every case
that word expresses an action identical in shade of
meaning in depth). But homonymy is the event of a
language that encloses the linguistic unit identical
in spelling but different in meaning (e.g. gül-flower
(a kind of plant), gül-smile (the imperative form of
the verb “gülm
ә
k”) [1], [2]. Both linguistic events
– polysemy and homonymy – bear obstacles
particularly in the process of formal parsing and
the level of the solution to these problems seriously
affects the translation quality. The fact that a word
form in the source language might have several
equivalents (translations) in the target language
necessitates the selection of the correct one of
those translations. The word sense disambiguation
and the part of speech disambiguation are both
rather important issues in regard to comprehen-
sibility of the text and the correspondence of the
translation of the text to the original alike.
In spite of the fact that, much has been done in
scope of the development of applied linguistic
technologies for Azerbaijani [3]-[8], the creation of
the algorithmic elimination mechanisms for ambi-
guous lexical units is still remaining as one of the
issues waiting for its solution in the field of
machine translation system development for our
language. Though one can come across several
research work concerning homonymy in Azerbai-
jani, its types and formal elimination algorithms
among these investigations [8]-[9], the algorithmic
elimination of the polysemy is still remaining as a
less investigated field.
Let’s have a look at the following examples:
Overcoming ambiguity of verbs in english-azerbaijani mt system: the initial approximation
44
ISSN 1563-0223 Bulletin KazNU. Filology series.
№ 2(136). 2012
1. Consider the use of the English word fare
(yol pulu, s
ә
rnişin v
ә
s.) in some contexts:
1) The conductor gathered the fares midway
the destination (Konduktor yol pullarını mәnzil
başına qәdәr olan yolun yarısında yığdı);
2) The taxi driver got no money from the fare
(Taksi sürücüsü şәrnişindәn pul almadı).
The word fare has the following translation
variants [10]:
fare [fεә]
1. n
1) yol haqqı, yol pulu
2) sәrnişin
3) yemәk, qida, xörәk
As the result of human identification of a
context, the selection of the correct sense bears not
so big obstacles (as in the first and second
sentences). But in the process of formal translation,
if an inappropriate occurrence of a word to the
context is selected (e.g. in the first sentence,
“yemәk ” or “sәrnişin ” in place of “yol pulu”), the
translation of the sentence will be too wide of the
sense of the original.
The disambiguation of an ambiguous word in
the process of translation conditions the cor-
respondence of the translation to the original in
meaning and consequently becomes one of the
factors identifying the quality of the translation.
Since the lack of the opportunity of modeling of all
the processes going in human brain, the identi-
fication of correct sense of all other occurrences of
a word is not an overt issue in the process of
machine translation. In other words, “to repeat” the
capacity of a human to identify the context, that is,
the development of algorithms and database of
features ensuring the normal functioning of these
algorithms to automate this process is still
remaining as an actual and open problem for all
natural languages without exception [11], [12].
It should be noted that, in formal translation,
even statistical calculation does not always help us.
Example:
2. According to the statistics, the English verb
“ run” is mostly used in the meaning of “ to move on
foot at a rapid pace” (in Azerbaijani “ qaçmaq”).
If we translate the sentence “This young manager
is successfully running the bank” (correct transla-
tion “ Bu cavan menecer bankı uğurla idar
ә
edir”)
according to the statistics, then we will get “bu
cavan menecer bankı yaxşı qaçır”. However, this
translation has nothing to do with the original
sentence (it expresses no sense in Azerbaijani).
2. Existing approaches to Word Sense Disam-
biguation
Multiple numbers of approaches have been
developed to formally disambiguate the ambiguous
words: Naïve Bayesian, Decision List, Nearest
Neighbor, Transformation Based Learning, Win-
now, Boosting, Naive Bayesian Ensemble etc [11]
− [17] .
Many of these approaches necessitate the
existence of parallel bilingual corpora (a text and
its translation into another language), but as there
are not such bilingual corpora available for
Azerbaijani it is not possible to eliminate this
problem using these approaches.
A second group of approaches use the cha-
racteristic features – the words used in the near
surrounding of the ambiguous word and/or other
grammatical features to disambiguate a word [4].
This method of disambiguation is used as a
solution to the problem of word sense disambi-
guation in the English-Azerbaijani MT system.
In this case, the words are analyzed at the level
of sentence and formal information sources are
used in the sentence. The local surrounding of a
polysemous word does much to identify the correct
sense of it. On the base of these features, formal
rules are created and codified being enclosed in the
system.
Example:
3. Consider these two sentences: I know you
(Mәn sәni tanıyıram) vә I know this word (Mәn bu
sözü bilirәm).
The English verb to know is used in both of
these sentences. It has to be translated as
“tanımaq” – “to be acquainted or familiar with”
in the first case, and “bilm
ә
k” – “to have a fami-
liarity or grasp of, as through study or experience”
in the second. For this purpose, it must be entered
in the system that, if the poysemous verb is
followed by an animated noun, nouns of place,
personal pronouns etc (this list can be enlarged)
then the verb has to be translated in the meaning of
“tanımaq” – “to be acquainted or familiar with”
otherwise “bilm
ә
k” – “to have a familiarity or
grasp of, as through study or experience”.
3. Formal elimination of ambiguity
The investigations carried out in this field of
science show that, some part of the most frequently
met errors are directly relevant to polysemy of
words in the process of synthesizing sentences in
Azerbaijani [18]. The incorrect disambiguation of
polysemous words causes the words to differ from
the original that is the English sentence in the
Azerbaijani translation and as the consequence
becomes one of the factors decreasing translation
quality in MT systems.
Ali Agababa Aliyev
45
Вестник КазНУ. Серия филологическая. № 2(136). 2012
In written sources, the most frequently used
English verbs, which have more than two meanings,
were primarily selected. For this purpose, English-
based sources have been used (
http://wordnet.
princeton.edu
).
Given the fact that, some verbs have multiple
translation variants (the second column, in the first
table) and every occurrence needs providing
formal features, a kind of question arises: is it
possible to replace the most identical meanings of
one and the same verb with the one that can
perform in place of them with special exactness?
The investigations conducted reaffirms to what
extent the answer to that question is significant
from the point of view of formal analyses. The
results of these investigations are shown in the
Table I (the number of occurrences of some verbs
from English into Azerbaijani).
Table I.
The number of translation variants of some verbs
Verb
Number of
translation
variants
Reduced
number of
translation
variants
say 6
4
get
19 14
make 10
7
go 21
17
see 8
7
know 6
4
take 26
20
think 9
5
As seen from the table, after the most identical
meanings have been omitted, the capacity of the
meaning is reduced approximately 29%.
Thus, the work on the creation of the database
of features for verb sense disambiguation has to be
carried out in the following two directions:
1. the reduction of the closest occurrences of
polysemous verbs;
2. the selection of features to correctly identify
the appropriate meaning;
Consider the following illustrative sentences to
notice the possibility of the reduction of the closest
meanings.
Example
3. Let’s have a look at the translation variants
of the English verb “subside”. It has the following
translation variants into Azerbaijani.
1) azaltmaq, әksilmәk, düşmәk
2) çökmәk, enmәk, yatmaq (torpaq vә s.)
3) sakitlәşmәk, sәngimәk (külәk, hәyәcan vә
s.) (“Polyglot” electronic dictionary).
It is obviously seen that, the presented
meanings have much in general originally. Thus
the translations in the second case, as in the first
one, express falling (düş
ә
n), weakening
(z
ә
ifl
ә
y
ә
n), and descending (en
ә
n) tone of the
meaning. (Notice that, in place of “külәk sәngiyir”
(the wind abates) “külәk zәiflәyir” (the wind
weakens) is also possible formally nevertheless a
bit far from the oral speaking style)
Thus, prior to finding formal features for some
senses of polysemous verbs, an occurrence that is
capable to deliver the meaning of the text to users
without misinterpretation has to be selected for
some closest meanings. In other words, by limiting
the shades of meanings, we broaden the oppor-
tunity of making formal the selection process of
the shades of meanings.
Sometimes, even though a verb has several
translation variants, we replace and enter it the MT
system dictionary as one occurrence, since its
translations are rather close and give us the chance
of substitutability. It prevents not only the
unnecessary expansion of the electronic database
but considerably decreases the search of formal
features for other meanings. After this stage, the
number of occurrences (≈26%) of polysemous
verbs was remarkably decreased (Table I. the
second column).
The second direction – the work on the
identification of the features for formal selection of
“context completing meaning” among other
occurrences is conducted for each occurrence of
every one of the first group verbs (to eliminate this
problem, we take verbs as groups to cope with).
These features must have characteristics ensuring
the unequivocal formal selection of the correct
sense of polysemous verbs.
In view of the fact that the sentence is the
biggest translation unit In the English-Azerbaijani
MT system these features are identified in the
frame of sentence. In the process of development
of the database for features there was a necessity to
analyze a vast amount of sentences which were
taken from out of the corpus created in advance.
This corpus, in its turn, is made up of several files
gathered from different sources. These texts were
taken from different internet sites (official
chronicle, papers, everyday life, science etc.) to
ensure the representativeness of the investigation.
Some of the features obtained as the result of this
work are shown in the following table.
Overcoming ambiguity of verbs in english-azerbaijani mt system: the initial approximation
46
ISSN 1563-0223 Bulletin KazNU. Filology series.
№ 2(136). 2012
Table II.
The database of features for formal
elimination of polysemy
Meaning
Code of
the feature
Feature
explanation
Order
Break vt
If the verb is
followed by a
word in the
function of object
- sındırmaq
1
Break vi
If the verb is not
followed by a
word in the
function of object
- sınmaq
2
Break
sındırmaq
1
Apart from this it is possible that, none of the
features entered will correspond to the context that
the word occurs in. Then the meanings with no
features – untagged data - will be introduced in
accordance with the order in the database, namely,
if we can not identify the context of a verb,
excluding the first sense, we introduce the
meanings of that verb in brackets in “from
dominant to less important” order.
4. Schematic illustration of verb sense disam-
biguation in the English – Azerbaijani MT system.
Regarding the above mentioned formal features
let’s consider the following formal feature
schematically.
If the sentence is introduced by words of
writing, letter, text etc. type:
As seen from the feature the correct meaning of
an ambiguous verb is identified by the formal
information the subject encloses. Let’s consider a
sentence in respect to the case:
Example:
Today’s papers write full page information
about the event (Bugünkü qәzetlәrdә hadisә
haqqında tam sәhifәlәri ilә yazılır)
In the sentence introduced, it is obviously seen
that the arrow from the verb with ambiguity targets
the subject. According to the formal rule with the
help of the information the subject encloses the
correct occurence of the verb write is identified for
the case. Now, let’s consider the algorithmic
consistency of the process:
Feature 1.
If the sentence is introduced by words of
writing, letter, text etc. type:
Algorithm 1
1. Ambiguous verb is defined in the sentence
2. The subject is taken for formal parsing
3. The subject is noticed whether it is one of
the words of writing, letter, text etc. type
4. If the subject is introduced by words of
writing, letter, text etc. type, the corresponding
formal feature entered in the database is taken for
translation
5. Otherwise other formal features are checked
The main purpose of this algorithm is to define
the correct meaning of the verb to the context
using the formal information the subject encloses.
Let’s consider schematic description of the
algorithm introduced above:
Scheme 1.
Conclusion:
The explorations in this field for Azerbaijani
have been conducted only in the last some years
nevertheless the researches dedicated to the
development of Machine Translation Systems
cover some decades.
Enter
Ambiguous verb is defined in the sentence
The subject is taken for formal parsing
The subject is noticed whether it is one of the words of writing,
letter, text etc. type
If the subject is introduced by words of writing, letter, text etc.
type, the corresponding formal feature entered in the database is
taken for translation
Otherwise other formal features are checked
Ali Agababa Aliyev
47
Вестник КазНУ. Серия филологическая. № 2(136). 2012
The most frequently used one thousand English
verbs were defined for word sense disambiguation;
it was defined that the number of meanings of
these verbs equals 4852 and occurred that some of
these verb meanings absolutely or to a great
percentage overlap with another translation variant
of the same verb. These meanings were replaced
with one meaning best completing overlapping
translation variants and finally remained 3899 verb
meanings. As the second step, the creation of
database of features to correctly select the
meaning, their input in the database and coding
operations were implemented. Consequently, 387
groups of verb meanings were created and for all
these meaning groups special algorithms were
developed. These algorithms are considered as the
initial approximation to the solution of the problem
in the English – Azerbaijani MT system. For better
results scientific activities are carried out in the
scope of statistics to develop a hybrid method.
References:
1. Wikipedia, the free encyclopedia,
http://en.
wikipedia.org/wiki/Ambiguity
2. Wikipedia, the free encyclopedia,
http://en.
wikipedia.org/wiki/Word_sense_disambiguatio
n
3. Abbasov A, Fatullayev A. 2007. The use of
syntactic and semantic valences of the verb for
formal delimitation of verb word phrases. In:
Proc. of L&TC’07, Poznan, Poland, pp. 468-
472
4. Fatullayev R, Abbasov A, Fatullayev A. 2008.
Dilmanc is the 1st MT system for Azerbaijani.
In: Proc. of SLTC-08, Stockholm, Sweden,
pp.63-64
5. Fatullayev R, Abbasov A, Fatullayev A. 2008.
Peculiarities of the development of the dictio-
nary for the MT System from Azerbaijani. In:
Proc. of EAMT-08, Hamburg, Germany,
pp.35-40
6. Fatullayev R, Abbasov A, Fatullayev A. 2008.
Set of active suffix chains and its role in devel-
opment of MT system for Azerbaycani. In:
Proc. of IMCSIT-08, Wisla, Poland, pp.363-368
7. Mahmudov M. 2002. Metnlerin formal tehlili
sistemi (Formal processing system of texts).
Elm, Baku
8. Fatullayev A., Shagavatov S. Formal elimina-
tion algorithms of some type homonyms in
Azerbaijani sentence. Proceedings of The
International Scientific Conference “Problems
of Cybernetics and Informatics”, Baku,
Azerbaijan, October 2006, pp. 108-111
9. Hasanov H.A. Dictionary of homonyms of
Modern Azerbaijani. Baku: Maarif, 1981, pp. 121.
10. Turksever (Musayev) O.İ. and et.al. English –
Azerbaijani dictionary. Baku, Qismet, 2003,
pp. 1673 .
11. Ahmed F. and Nürnberger A. Arabic/English
Word Translation Disambiguation Approach
based on Naive Bayesian Classifier. Proce-
edings of the International Multiconference on
Computer Science and Information Techno-
logy, Wisla, Poland, 2008, pp. 331 – 338
12. Yarowsky D. Decision Lists for Lexical
Ambiguity Resolution:Application to Accent
Restoration in Spanish and French. In Proce-
edings of the 32nd Annual Meeting of the
Association for Computational Linguistics,
1994, pp. 88-95.
13. Gale, K. Church, and D. Yarowsky, A Method
for Disambiguating Word Senses in a Large
Corpus. Computers and Humanities, vol. 26,
1992, pp. 415-439.
14. Gerard E., Mаrquez L., Rigau G. Boosting
applied to word sense disambiguation. Proce-
edings of the 12th European Conference on
Machine Learning (ECML), Barcelona, Spain,
2000, pp. 129-141.
15. Pedersen T. A simple approach to building
ensembles of Naive Bayesian classifiers for
word sense disambiguation. In Proceedings of
the First Annual Meeting of the North Ame-
rican Chapter of the Association for Compu-
tational Linguistics, Seattle, WA, May, 2000,
pp. 63–69.
16. Ahlswede T.E. (1995). “Word Sense Disam-
biguation by Human Informants.” Proceedings
of the Sixth Midwest Artificial Intelligence
and Cognitive Society Conference, Carbon-
dale, Illinois, April 1995, 73-78.
17. William G.A., Kenneth C.W., Yarowsky D.
(1993). “A method for disambiguating word
senses in a large corpus.” Computers and the
Humanities, 26, 415-439.
18. Fatullayev R., Mammadova S., Fatullayev A.
Statistical analysis of the factors influencing
the translation quality of the Dilmanc MT
system. In proc. of the International Confe-
rence on Problems of Cybernetics and Infor-
matics (PCI-2008), Baku, September-2008, pp.
96-99
* * *
В данной работе представляется процесс решения
многозначности некоторых английских глаголов в англо–
Overcoming ambiguity of verbs in english-azerbaijani mt system: the initial approximation
48
ISSN 1563-0223 Bulletin KazNU. Filology series.
№ 2(136). 2012
азербайджанской системе MП. Описывается процесс
создания базы данных формальных признаков реше-
ния многозначности глаголов и на основе этих баз
разрабатываются
алгоритмы
решения многознач-
ности в процессе автоматического перевода. Эти
базы и алгоритмы представляют начальную ап-
проксимацию в полном решении многозначности гла-
голов.
Достарыңызбен бөлісу: |