Казахский национальный



Pdf көрінісі
бет7/43
Дата03.03.2017
өлшемі3,12 Mb.
#5534
1   2   3   4   5   6   7   8   9   10   ...   43

 
Литература
 
1.  Гранин, Даниил. Вечера с Петром Великим. 
Сообщения и свидетельства господина М. – 
СПб., 2002. 
2.  Караулов    Ю.Н.  Русский  язык  и  языковая 
личность. – М., 1987. 
3.  Скребнев  Ю.М.  Эпитет // Русский  язык: 
Энциклопедия. Изд. 2-е, перераб. и доп. / Гл. 
ред. Ю.Н. Караулов. – М., 1998. – С. 640-641. 
4.  Словарь русского языка ХI-XVII вв. Вып. 1-
26. – М.:  Наука, 1975 …. (прод.  изд.). – 
(СлРЯ ХI-XVII). 
5.  Тимофеев Л.И. Теория литературы. – М., 1971. 
 
 
 
 
 
 
* * * 
Бұл  мақалада  орыс  тілінің  диахронды  дамуы  тұр-
ғысынан  эпитеттің  функциональды  ерекшеліктеріне  дәс-
түрлі  емес  көзқарасы  ұсынылады.  Петр I -дің  хаттары 
негізінде  эпитеттің  аксиологиялық  шкаласының  қалып-
тасуы  көрсетілген,  және XVII ғасырдың  аяғы  мен XVIII 
ғасырдың  бірінші  жарты  жылдығында  құнды  сипатқа  ие 
бола  алатын  тілдің  жеке  тұлғасына  ғана  қатысты  емес, 
сонымен  қатар  орыс  тілінің  ұлттық  тіл  ретінде  қа-
лыптасуының  жалпы  эволюциялық  үрдісіне  де  қатысты 
болады. 
* * * 
The article gives the nonconventional view on functional 
features of epithet from positions of the diachronic 
development of the Russian language. On the example of the 
correspondence of Peter I, it shows the formation of the 
epithet axiological scale representing a set of value 
characteristics correlated not only with a particular language 
personality of the end of the 17-th century - the first third of 
the 18-th century, but also with the general evolutionary 
processes in the Russian language at the initial stage of its 
formation as the national language. 
 
 
 
 
Overcoming ambiguity of verbs in english-azerbaijani mt system: the initial approximation 
Ali Agababa Aliyev 
Azerbaijan University of languages, Chair of General Linguistics, Baku, Azerbaijan 
 
Аннотация. In this paper, the whole process of disambiguation of verbs in the English – Azerbaijani MT 
system is introduced. The creation of database of formal features and algorithms, presenting their functioning 
in the translation process, is illustrated. As initial approximation on the issue of disambiguation of verb 
senses, the working mechanism is considered as a basic result to the solution of the problem in the English – 
Azerbaijani MT system.   
 
1.  Ambiguity and its effect on the translation 
quality 
 The case of linguistic unit having two or more 
senses, identical in the shade of meaning, is 
considered as polysemy in natural languages (e.g. 
the verb bring can be translated as fetch smth, give 
rise, deliver, profit, to raise a question, introduce, 
convince and other occurrences and in every case 
that word expresses an action identical in shade of 
meaning in depth). But homonymy is the event of a 
language that encloses the linguistic unit identical 
in spelling but different in meaning (e.g. gül-flower 
(a kind of plant), gül-smile (the imperative form of 
the verb “gülm
ә
k”) [1], [2]. Both linguistic events 
– polysemy and homonymy – bear obstacles 
particularly in the process of formal parsing and 
the level of the solution to these problems seriously 
affects the translation quality. The fact that a word 
form in the source language might have several 
equivalents (translations) in the target language  
 
necessitates the selection of the correct one of 
those translations. The word sense disambiguation 
and the part of speech disambiguation are both 
rather important issues in regard to comprehen-
sibility of the text and the correspondence of the 
translation of the text to the original alike. 
In spite of the fact that, much has been done in 
scope of the development of applied linguistic 
technologies for Azerbaijani [3]-[8], the creation of 
the algorithmic elimination mechanisms for ambi-
guous lexical units is still remaining as one of the 
issues waiting for its solution in the field of 
machine translation system development for our 
language. Though one can come across several 
research work concerning homonymy in Azerbai-
jani, its types and formal elimination algorithms 
among these investigations [8]-[9], the algorithmic 
elimination of the polysemy is still remaining as a 
less investigated field. 
Let’s have a look at the following examples: 
Overcoming ambiguity of verbs in english-azerbaijani mt system: the initial approximation 

44 
 
 
 
ISSN 1563-0223                        Bulletin KazNU. Filology series. 
№ 2(136). 2012 
1.  Consider the use of the English word fare 
(yol pulu, s
ә
rnişin v
ә
 s.) in some contexts: 
1)  The conductor gathered the fares midway 
the destination (Konduktor yol pullarını  mәnzil 
başına qәdәr olan yolun yarısında yığdı); 
2)  The taxi driver got no money from the fare 
(Taksi sürücüsü şәrnişindәn pul almadı). 
The word fare has the following translation 
variants [10]: 
fare [fεә] 
1. n  
1) yol haqqı, yol pulu  
2) sәrnişin  
3) yemәk, qida, xörәk 
As the result of human identification of a 
context, the selection of the correct sense bears not 
so big obstacles (as in the first and second 
sentences). But in the process of formal translation, 
if an inappropriate occurrence of a word to the 
context is selected (e.g. in the first sentence, 
yemәk” or “sәrnişin in place of “yol pulu”), the 
translation of the sentence will be too wide of the 
sense of the original. 
The disambiguation of an ambiguous word in 
the process of translation conditions the cor-
respondence of the translation to the original in 
meaning and consequently becomes one of the 
factors identifying the quality of the translation. 
Since the lack of the opportunity of modeling of all 
the processes going in human brain, the identi-
fication of correct sense of all other occurrences of 
a word is not an overt issue in the process of 
machine translation. In other words, “to repeat” the 
capacity of a human to identify the context, that is, 
the development of algorithms and database of 
features ensuring the normal functioning of these 
algorithms to automate this process is still 
remaining as an actual and open problem for all 
natural languages without exception [11], [12].  
It should be noted that, in formal translation, 
even statistical calculation does not always help us.  
Example: 
2.  According to the statistics, the English verb 
run” is mostly used in the meaning of “to move on 
foot at a rapid pace” (in Azerbaijani “qaçmaq”).  
If we translate the sentence “This young manager 
is successfully running the bank” (correct transla-
tion “Bu cavan menecer bankı uğurla idar
ә
 edir”) 
according to the statistics, then we will get “bu 
cavan menecer bankı yaxşı qaçır”. However, this 
translation has nothing to do with the original 
sentence (it expresses no sense in Azerbaijani).  
2. Existing approaches to Word Sense Disam-
biguation 
Multiple numbers of approaches have been 
developed to formally disambiguate the ambiguous 
words:  Naïve Bayesian, Decision List, Nearest 
Neighbor, Transformation Based Learning, Win-
now, Boosting, Naive Bayesian Ensemble etc [11] 
− [17].  
Many of these approaches necessitate the 
existence of parallel bilingual corpora (a text and 
its translation into another language), but as there 
are not such bilingual corpora available for 
Azerbaijani it is not possible to eliminate this 
problem using these approaches. 
A second group of approaches use the cha-
racteristic features – the words used in the near 
surrounding of the ambiguous word and/or other 
grammatical features to disambiguate a word [4]. 
This method of disambiguation is used as a 
solution to the problem of word sense disambi-
guation in the English-Azerbaijani MT system.  
In this case, the words are analyzed at the level 
of sentence and formal information sources are 
used in the sentence. The local surrounding of a 
polysemous word does much to identify the correct 
sense of it. On the base of these features, formal 
rules are created and codified being enclosed in the 
system. 
Example: 
3.  Consider these two sentences: I know you 
(Mәn sәni tanıyıram) vә I know this word (Mәn bu 
sözü bilirәm). 
The English verb to know is used in both of 
these sentences. It has to be translated as 
“tanımaq” – “to be acquainted or familiar with” 
in the first case, and “bilm
ә
k” – “to have a fami-
liarity or grasp of, as through study or experience” 
in the second. For this purpose, it must be entered 
in the system that, if the poysemous verb is 
followed by an animated noun, nouns of place, 
personal pronouns etc (this list can be enlarged) 
then the verb has to be translated in the meaning of 
“tanımaq” – “to be acquainted or familiar with” 
otherwise  “bilm
ә
k” – “to have a familiarity or 
grasp of, as through study or experience”
3. Formal elimination of ambiguity 
The investigations carried out in this field of 
science show that, some part of the most frequently 
met errors are directly relevant to polysemy of 
words in the process of synthesizing sentences in 
Azerbaijani [18]. The incorrect disambiguation of 
polysemous words causes the words to differ from 
the original that is the English sentence in the 
Azerbaijani translation and as the consequence 
becomes one of the factors decreasing translation 
quality in MT systems.  
Ali Agababa Aliyev 

45 
 
Вестник КазНУ. Серия филологическая. № 2(136). 2012 
 
 
In written sources, the most frequently used 
English verbs, which have more than two meanings, 
were primarily selected. For this purpose, English-
based sources have been used (
http://wordnet. 
princeton.edu
).  
Given the fact that, some verbs have multiple 
translation variants (the second column, in the first 
table) and every occurrence needs providing 
formal features, a kind of question arises: is it 
possible to replace the most identical meanings of 
one and the same verb with the one that can 
perform in place of them with special exactness? 
The investigations conducted reaffirms to what 
extent the answer to that question is significant 
from the point of view of formal analyses. The 
results of these investigations are shown in the 
Table I (the number of occurrences of some verbs 
from English into Azerbaijani). 
Table I.  
The number of translation variants of some verbs 
 
Verb 
Number of 
translation 
variants 
Reduced 
number of 
translation 
variants 
say 6 

get 
 
19 14 
make 10 

go 21 
17 
see 8 

know 6 

take 26 
20 
think 9 

 
As seen from the table, after the most identical 
meanings have been omitted, the capacity of the 
meaning is reduced approximately 29%. 
 
Thus, the work on the creation of the database 
of features for verb sense disambiguation has to be 
carried out in the following two directions: 
1.  the reduction of the closest occurrences of 
polysemous verbs; 
2.  the selection of features to correctly identify 
the appropriate meaning; 
Consider the following illustrative sentences to 
notice the possibility of the reduction of the closest 
meanings.  
Example 
3.  Let’s have a look at the translation variants 
of the English verb “subside”. It has the following 
translation variants into Azerbaijani. 
1) azaltmaq, әksilmәk, düşmәk  
2) çökmәk, enmәk, yatmaq (torpaq vә s.) 
3) sakitlәşmәk, sәngimәk (külәk, hәyәcan vә 
s.) (“Polyglot” electronic dictionary).  
It is obviously seen that, the presented 
meanings have much in general originally. Thus 
the translations in the second case, as in the first 
one, express falling (düş
ә
n), weakening 
(z
ә
ifl
ә
y
ә
n),  and descending (en
ә
n)  tone of the 
meaning. (Notice that, in place of “külәk sәngiyir” 
(the wind abates) “külәk zәiflәyir”  (the wind 
weakens) is also possible formally nevertheless a 
bit far from the oral speaking style) 
Thus, prior to finding formal features for some 
senses of polysemous verbs, an occurrence that is 
capable to deliver the meaning of the text to users 
without misinterpretation has to be selected for 
some closest meanings. In other words, by limiting 
the shades of meanings, we broaden the oppor-
tunity of making formal the selection process of 
the shades of meanings.  
Sometimes, even though a verb has several 
translation variants, we replace and enter it the MT 
system dictionary as one occurrence, since its 
translations are rather close and give us the chance 
of substitutability. It prevents not only the 
unnecessary expansion of the electronic database 
but considerably decreases the search of formal 
features for other meanings. After this stage, the 
number of occurrences (≈26%) of polysemous 
verbs was remarkably decreased (Table I. the 
second column). 
The second direction – the work on the 
identification of the features for formal selection of 
“context completing meaning” among other 
occurrences is conducted for each occurrence of 
every one of the first group verbs (to eliminate this 
problem, we take verbs as groups to cope with). 
These features must have characteristics ensuring 
the unequivocal formal selection of the correct 
sense of polysemous verbs.  
In view of the fact that the sentence is the 
biggest translation unit In the English-Azerbaijani 
MT system these features are identified in the 
frame of sentence. In the process of development 
of the database for features there was a necessity to 
analyze a vast amount of sentences which were 
taken from out of the corpus created in advance. 
This corpus, in its turn, is made up of several files 
gathered from different sources. These texts were 
taken from different internet sites (official 
chronicle, papers, everyday life, science etc.) to 
ensure the representativeness of the investigation. 
Some of the features obtained as the result of this 
work are shown in the following table. 
 
Overcoming ambiguity of verbs in english-azerbaijani mt system: the initial approximation

46 
 
 
 
ISSN 1563-0223                        Bulletin KazNU. Filology series. 
№ 2(136). 2012 
Table II.  
The database of features for formal  
elimination of polysemy 
 
Meaning  
Code of 
the feature 
Feature 
explanation 
Order
Break vt 
If the verb is 
followed by a 
word in the 
function of object 
sındırmaq  

Break vi 
If the verb is not 
followed by a 
word in the 
function of object 
sınmaq  

Break  
sındırmaq 

 
Apart from this it is possible that, none of the 
features entered will correspond to the context that 
the word occurs in. Then the meanings with no 
features – untagged data -  will be introduced in 
accordance with the order in the database, namely, 
if we can not identify the context of a verb, 
excluding the first sense, we introduce the 
meanings of that verb in brackets in “from 
dominant to less important” order. 
4. Schematic illustration of verb sense disam-
biguation in the English – Azerbaijani MT system. 
Regarding the above mentioned formal features 
let’s consider the following formal feature 
schematically. 
  If the sentence is introduced by words of 
writing, letter, text etc. type: 
 
As seen from the feature the correct meaning of 
an ambiguous verb is identified by the formal 
information the subject encloses. Let’s consider a 
sentence in respect to the case: 
Example: 
  Today’s papers write full page information 
about the event (Bugünkü qәzetlәrdә hadisә 
haqqında tam sәhifәlәri ilә yazılır) 
In the sentence introduced, it is obviously seen 
that the arrow from the verb with ambiguity targets 
the subject. According to the formal rule with the 
help of the information the subject encloses the 
correct occurence of the verb write is identified for 
the case. Now, let’s consider the algorithmic 
consistency of the process: 
Feature 1. 
  If the sentence is introduced by words of 
writing, letter, text etc. type: 
Algorithm 1 
1.  Ambiguous verb is defined in the sentence 
2.  The subject is taken for formal parsing 
3.  The subject is noticed whether it is one of 
the words of writing, letter, text etc. type 
4.  If the subject is introduced by words of 
writing, letter, text etc. type, the corresponding 
formal feature entered in the database is taken for 
translation 
5.  Otherwise other formal features are checked 
The main purpose of this algorithm is to define 
the correct meaning of the verb to the context 
using the formal information the subject encloses. 
  Let’s consider schematic description of the 
algorithm introduced above: 
 
Scheme 1. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Conclusion: 
The explorations in this field for Azerbaijani 
have been conducted only in the last some years  
 
 
 
 
 
nevertheless the researches dedicated to the 
development of Machine Translation Systems 
cover some decades. 
Enter
Ambiguous verb is defined in the sentence 
The subject is taken for formal parsing 
The subject is noticed whether it is one of the words of writing, 
letter, text etc. type
If the subject is introduced by words of writing, letter, text etc. 
type, the corresponding formal feature entered in the database is 
taken for translation
Otherwise other formal features are checked 
Ali Agababa Aliyev

47 
 
Вестник КазНУ. Серия филологическая. № 2(136). 2012 
 
 
The most frequently used one thousand English 
verbs were defined for word sense disambiguation; 
it was defined that the number of meanings of 
these verbs equals 4852 and occurred that some of 
these verb meanings absolutely or to a great 
percentage overlap with another translation variant 
of the same verb. These meanings were replaced 
with one meaning best completing overlapping 
translation variants and finally remained 3899 verb 
meanings. As the second step, the creation of 
database of features to correctly select the 
meaning, their input in the database and coding 
operations were implemented. Consequently, 387 
groups of verb meanings were created and for all 
these meaning groups special algorithms were 
developed. These algorithms are considered as the 
initial approximation to the solution of the problem 
in the English – Azerbaijani MT system. For better 
results scientific activities are carried out in the 
scope of statistics to develop a hybrid method.  
 
References: 
 
1.  Wikipedia, the free encyclopedia, 
http://en. 
wikipedia.org/wiki/Ambiguity
  
2.  Wikipedia, the free encyclopedia, 
http://en. 
wikipedia.org/wiki/Word_sense_disambiguatio
n
 
3.  Abbasov A, Fatullayev A. 2007. The use of 
syntactic and semantic valences of the verb for 
formal delimitation of verb word phrases. In: 
Proc. of L&TC’07, Poznan, Poland, pp. 468-
472 
4.  Fatullayev R, Abbasov A, Fatullayev A. 2008. 
Dilmanc is the 1st MT system for Azerbaijani.  
In: Proc. of SLTC-08, Stockholm, Sweden, 
pp.63-64 
5.  Fatullayev R, Abbasov A, Fatullayev A. 2008. 
Peculiarities of the development of the dictio-
nary for the MT System from Azerbaijani.  In: 
Proc. of EAMT-08, Hamburg, Germany, 
pp.35-40 
6.  Fatullayev R, Abbasov A, Fatullayev A. 2008. 
Set of active suffix chains and its role in devel-
opment of MT system for Azerbaycani. In: 
Proc. of IMCSIT-08, Wisla, Poland, pp.363-368 
7.  Mahmudov M. 2002. Metnlerin formal tehlili 
sistemi (Formal processing system of texts). 
Elm, Baku 
8.  Fatullayev A., Shagavatov S. Formal elimina-
tion algorithms of some type homonyms in 
Azerbaijani sentence. Proceedings of The 
International Scientific Conference “Problems 
of Cybernetics and Informatics”, Baku, 
Azerbaijan, October 2006, pp. 108-111  
9.  Hasanov H.A. Dictionary of homonyms of 
Modern Azerbaijani. Baku: Maarif, 1981, pp. 121. 
10. Turksever (Musayev) O.İ. and et.al. English – 
Azerbaijani dictionary. Baku, Qismet, 2003, 
pp. 1673 .  
11. Ahmed F. and Nürnberger A. Arabic/English 
Word Translation Disambiguation Approach 
based on Naive Bayesian Classifier. Proce-
edings of the International Multiconference on 
Computer Science and Information Techno-
logy, Wisla, Poland, 2008, pp. 331 – 338 
12. Yarowsky D. Decision Lists for Lexical 
Ambiguity Resolution:Application to Accent 
Restoration in Spanish and French. In Proce-
edings of the 32nd Annual Meeting of the 
Association for Computational Linguistics, 
1994, pp. 88-95. 
13. Gale, K. Church, and D. Yarowsky, A Method 
for Disambiguating Word Senses in a Large 
Corpus. Computers and Humanities, vol. 26, 
1992, pp. 415-439. 
14. Gerard E., Mаrquez L., Rigau G. Boosting 
applied to word sense disambiguation. Proce-
edings of the 12th European Conference on 
Machine Learning (ECML), Barcelona, Spain, 
2000, pp. 129-141. 
15. Pedersen T. A simple approach to building 
ensembles of Naive Bayesian classifiers for 
word sense disambiguation. In Proceedings of 
the First Annual Meeting of the North Ame-
rican Chapter of the Association for Compu-
tational Linguistics, Seattle, WA, May, 2000, 
pp. 63–69. 
16. Ahlswede T.E. (1995). “Word Sense Disam-
biguation by Human Informants.” Proceedings 
of the Sixth Midwest Artificial Intelligence 
and Cognitive Society Conference, Carbon-
dale, Illinois, April 1995, 73-78. 
17. William G.A., Kenneth C.W., Yarowsky D. 
(1993). “A method for disambiguating word 
senses in a large corpus.” Computers and the 
Humanities, 26, 415-439. 
18. Fatullayev R., Mammadova S., Fatullayev A. 
Statistical analysis of the factors influencing 
the translation quality of the Dilmanc MT 
system. In proc. of the International Confe-
rence on Problems of Cybernetics and Infor-
matics (PCI-2008), Baku, September-2008, pp. 
96-99 
* * * 
В  данной  работе представляется  процесс решения 
многозначности некоторых английских глаголов в англо– 
 
Overcoming ambiguity of verbs in english-azerbaijani mt system: the initial approximation

48 
 
 
 
ISSN 1563-0223                        Bulletin KazNU. Filology series. 
№ 2(136). 2012 
азербайджанской  системе MП.  Описывается  процесс 
создания  базы  данных формальных  признаков  реше- 
ния  многозначности  глаголов  и  на  основе  этих  баз 
разрабатываются 
алгоритмы 
решения многознач- 
 
 
 
 
 
 
ности  в  процессе  автоматического  перевода.  Эти  
базы  и  алгоритмы  представляют  начальную  ап-
проксимацию  в  полном  решении  многозначности  гла-
голов. 
 
 
 

Достарыңызбен бөлісу:
1   2   3   4   5   6   7   8   9   10   ...   43




©emirsaba.org 2024
әкімшілігінің қараңыз

    Басты бет