326
Phrase
Description
Examples of English chunks
detected
Chunk-level tags
PP
Postpositional phrase (except
genitive phrases ending in “-
{N}{I}ң”)
12
preposition–noun
preposition–determiner–noun
preposition–numeral–noun
preposition–adjective–noun
preposition–determiner–
adjective–noun
number,
person,
possessor*, and case
GenP
Genitive postpositional phrase
ending in “-{N}{I}ң”
(same as PP, with “of” as
preposition)
number,
person,
possessor*, and case
AdjP
Adjectival
phrase
(except
superlatives with “ең”)
adjective
“more”–adjective
(none)
SupP
Superlative phrase (adjectival
phrase with “ең”)
adjective_“-est”
the–“most”–adjective
possessor*,
13
case
Translation of noun phrases
Consider the following example: the chunker identifies the English sequence
the large book
(determiner–adjective–noun) as a noun-phrase chunk. It translates into Kazakh, and assigns it four
chunk-level tags: number (set to singular), person (set to 3rd), possessor (to be determined, as the
noun
кітап ('book') could receive a 3rd-person possesive ending (кітабы) later if the context were,
for instance,
the large book of animals, аңдардың үлкен кітабы), and case (to be determined as it
could be, for instance, accusative in
I saw the large book, Мен үлкен кітапты көрдім ).
Translation of prepositional phrases
On encountering an
English prepositional phrase, which has to be rendered in Kazakh as a
postpositional phrase, there are three possible outcomes:
(1)
The prepositional phrase results in a simple postpositional phrase using the locative “-
{D}{A}”,
14
ablative “-{D}{A}н”, etc., but not the genitive “-{N}{I}ң”:
[PP [P
in ] [NP
the beautiful garden] ] → [PP [NP
әдемі бақша] [P
-да ] ]
(2)
The prepositional phrase results in a simple postpositional phrase using the genitive -NIņ,
which will be marked GenP:
[PP [P
of ] [NP
the beautiful garden] ] → [GenP [NP
әдемі бақша] [P
-ның] ]
(3)
The prepositional phrase results in a complex postpositional phrase based around a noun
such as
аст, үст, etc.:
[PP [P
under ] [NP
the garden] ] → [PP [NP [GenP [NP
бақша ][P
-ның]] [NP
астын]] [P
-да ] ]
In all three cases, the possessor tag of the chunk, which corresponds to the main noun in the PP
or the GenP has to be left open to being determined in later transfer operations (consider, for
instance, the case that the PP
in the beautiful garden is part of a larger structure,
in the beautiful
garden of the city,
қаланың әдемі бақшасында, in which the noun
бақша 'garden' receives a
possessive ending).
Translation of verb phrases
The mapping of English verb tenses onto Kazakh is not completely straightforward and is treated
in the chunker. Just to give a few examples, present simple and future are rendered using the same
12 Upper case letters in braces (such as {N}) represent hypothetical archiphonemes (actually archigraphemes) that are
realized as phonemes (actually graphemes) after morphophonological rules have been applied. For instance, in the
genitive ending “-{N}{I}ң”, the archiphoneme {N} may be realized as т, д, or н and the archiphoneme {I} may be
realized as і or ы depending on the previous phonological context. This is perfromed during morphological
generation (see section 0).
13 Treated as a noun phrase with an implied noun (“the largest [book]”)
14 {D} can be
д or
т, and {A} can be
е or
а, depending on the phonological context.
327
tense in Kazakh (
I play → Мен ойнаймын; I will play → Мен ойнаймын); tenses expressing
continued activity, such as the English present continuous or past continuous (
I am playing, I was
playing), have to be detected and mapped onto sets of two lexical units (
Мен ойнап жатырмын,
Мен ойнап отырдым) where the main verb is found in the
-п participle form (
ойнап), and a
suitable finite form (
жатырмын, отырдым) of an auxiliary verb (
жатыр, отыр) is used to
express
number and person agreement
15
(see Table Table) for details.
Table 3 Examples of tense mapping operations performed at the chunk level
English
tense
Example
Morphemes
Equivalent tense in
Kazakh
Translation
Present
Simple
I play
ойна +
й(
а or
е)
+
Ауыспалы осы шақ
(changing
present
simple)
Мен ойнаймын
Present
Continuous
I
am
playing
ойна + п (ып or іп)
+
жатыр +
+
Нақ осы шақ
(now present tense)
Мен
ойнап
жатырмын
Past
Continuous
I
was
playing
ойна + п (ып or іп)
+
отыр + д{I}
+
Бұрынғы өткен шақ
(past continuous)
Мен
ойнап
отырдым
Verb-phrase chunks are also used to prepare translations not using a finite verb (but a nominal or
adjectival structure, often based on non-finite forms of verbs instead). For instance, for obligatory
English modal constructs (have to, must, need to, should) verb phrases made up of three lexical
units have to be generated, with a verbal noun, an adjective roughly meaning “necessary” (керек) or
“proper” (жөн), and a form of the copula (absent in present tense); the subject receives the genitive
or dative case: I have to go → Менің баруым керек, I need to go → Маған бару керек, etc.; see
these and other modal construction examples in Table Table.
Table 4 Translation of some English modal verbs
Construction
Example
Morphemes
Translation
Gloss
Must
have to
I must go,
I have to
go
Мен + -{N}{I}ң + бар
+ -y + -{I}м <1st person
possessive> + керек <adjective>
Менің
баруым
керек
My
going
necessary [is]
Should
I
should
go
Мен + -{N}{I}ң + бар
+ -{G}{A}н
+ -
{I}м <1st person possessive> +
жөн
Менің
барғаным
жөн
My
going
proper [is]
Need to
I need to
go
Мен + -{G}{A}[н] + бар
+
-y
+
керек
<adjective>
Маған бару
керек
To me, going
necessary [is]
Want to
I want to
go
Мен + -{N}{I}ң + бар
+ -{G}{I} + -{I}м <1st person
possessive> + кел + -{E}д{I}
Менің
барғым
келеді
My going will
come
15 Actually, Kazakh language uses four auxiliary verbs: жатыр ('lie', used when the activity takes a long time),
отыр ('sit', used when the activity appears to be done in a sitting position), тұр ('stand', when the takes a short
time), and жүр (when the activity repeats regularly). Choosing the most adequate auxiliary verb is hard without a
semantic analysis, which is not easily available in Apertium. Our current choice (an approximation) is жатыр
('lie') for the present continuous and отыр ('sit') for the past continuous.
328
Finally, as negative constructions in English contain more words than their corresponding
affirmative words, or may even use an auxiliary verb (as in
do not,
did not), they have to be
separately detected as verb chunks to generate the appropriate Kazakh negative forms (
I play →
мен ойнаймын; I do not play → мен ойнамаймын. For examples of other negative constructs, see
Table Table).
Table 5 Translation of some negative constructions.
Construction
Example
Morphemes
Translation
Note
Present
Continuous
(negative)
I am not
playing
ойна +
п (
ып or
іп)
+
жатқан
жоқ
+
Мен ойнап
жатқан
жоқпын
In present auxiliary verbs
(жатыр/отыр) do not have
a synthetic negative form.
Can
(negative)
I can not
play
Ойна
+
-{E}
+ ал + ма
+ й + мын
<1st person>
Мен ойнай
алмаймын
Verb phrases (VP) are marked at the chunk level with person and number, both to be determined
and linked via references to the appropriate morphemes in the appropriate verb lexical forms. The
chunk-level person and number to be determined will be rewritten by the appropriate 2nd-level
(interchunk) transfer rules, and will be propagated to lexical forms at the 3rd-level transfer stage
(postchunk).
Other indicators that have to be made available at the chunk level are negation (for negative
verbs) and conditional (which will be handled as a tense). For instance, negation can be easily
determined at the chunking level when the English VP chunk contains not, as in I don’t play → мен
ойнамаймын, but may need to be determined at the interchunk level in sentences having a non-
negative VP but a negative word like those starting with еш-, like I write nothing → мен ешнәрсе
жазбаймын, which requires a negative form of the verb (-ба- in the example).
Translation of adjectival phrases
In Kazakh noun phrases, adjectives come before nouns and do not show any agreement with
nouns. Adjectives can also appear in separate adjective phrases. Here are some examples:
(4)
The adjective alone, marked AdjP:
[AdjP beautiful ] → [AdjP әдемі]
(5)
Comparative adjective phrases (English more + adjective, or adjective-[e]r); the Kazakh
translation chooses the comparative suffix “-{I}р{A}{K}”:
[AdjP more beautiful ] → [AdjP әдемірек]
(6)
For superlative adjective phrases “the most + adjective” or “adjective-[e]st”, translation is
built using “ең” + adjective:
[SupP the largest ] → [SupP ең әдемі]
[SupP the most beautiful ] → [SupP ең үлкен]
As noted in §0, superlative adjective phrases have some properties of noun phrases (such as
receiving possessive morphemes when modified by a genitive phrase: the most beautiful of people
→ адамдардың ең әдемісі); one could say that they are treated as NPs with an implied noun.
English-to-Kazakh inter-chunk processing
The second round of structural transfer (the “interchunk” rules written in the apertium-eng-
kaz.eng-kaz.t2x file) is currently performed by a proof-of-concept set of 18 rules, representative of
following operations:
Inter-chunk agreement (for instance, number and person agreement between subject noun
phrase and verb phrase): features to be agreed here are left undefined by the chunker; those that are
not defined at the interchunk phase are left for the post-chunk phases.
329
Assigning case to noun phrases (which are generated without case by the chunker): for
instance, accusative case for objects (
I see the sky →
Мен аспанды көремін), genitive case for
obligatory constructs (
I have to go → менің баруым керек), dative case for the verb
to need (
I
need a book →
Маған кітап керек), locative case for possession (
I have a book →
Менде кітап
бар), etc.
Reordering: placing of object before verb (
I[1] see[2] the sky[3] →
Мен[1] аспанды[3]
көремін[2]), placing of prepositional pharses before the verb (
They[1] played[2] on top of the tree
[3] → Олар[1] ағаштың үстінде[3] ойнады [2]), etc.
The set of rules has to be extended, as many combinations of the above phenomena are still not
covered (for instance, there is no rule to obtain the right word order in
I have to go to the university
→
Менің университетке баруым керек).
16
Some results, problems and limitations
The system described is not much more than a proof-of-concept system that still needs to be
extended to reasonably cover all transfer operations needed. Therefore, evaluating the output of the
system using customary evaluation measures such as BLEU (Papineni et al. 2002) is still out of the
question.
Instead, tables Table and Table show how our current prototype performs for some
representative structures covered by the transfer rules currently available (some of them discussed
above). As has been said above, are already at least two MT systems that translate from English to
Kazakh: Sanasoft's and Trident's, both of which can be used online (see Introduction for details);
therefore, we will briefly compare our results to those obtained by the commercial systems.
Table 6 Example machine translation output for some simple phrases and sentences.
Structure/problems
English
Kazakh
(Apertium)
Kazakh
(Sanasoft)
Kazakh
(Trident)
Noun phrases
your two beautiful
gardens
сіздің екі әдемі
бақшаңыз
Сенің екі әдемі
бақтарың.
сендер
екі
тамаша
бақша
лар
Prepositional
phrases
in the big city
үлкен қалада
Үлкен қала
үлкен қалада
Possessives
the chief of the city
қаланың
басшысы
қаланың көсемі Бас қала
On top of the tree of
the garden of the
city
қала
бақшасының
ағашының
үстінде
Зырылдауық
ағаш бақ қала
В алқындыр-
қаланың
бақшасының
ағашының
Adjective phrases
bigger
үлкенірек
Үлкен
үлкен
Modal verbs
I have to go
Менің баруым
керек
Мен
барып
жатырмын
жүрмін
Маған бару have
I can drive
Мен жүргізе
аламын
Мен
болып
жатырмын
жүргізіп
жатырмын
Мен
жүру
білемін
16 As chunks detected by the chunker are finite-length and inter-chunk rules also process finite-length chunk
sequences, it has to be noted that there will always be a limit to the scope of reordering or agreement rules.
330
Concluding remarks and future work
The current prototype already successfully solves many cases of noun-phrase, verb-phrase,
prepositional-phrase, and adjectival-phrase translation (some actually better than the available
commercial systems), and contains a reasonable vocabulary for testing purposes, which
nevertheless still needs extending for real-world applications.
The following tasks have to be performed in order to have a working machine translation system:
Completing the coverage of structural transfer rules and monolingual and bilingual
vocabularies so that the system produces a translation for at least 90% of the English words and
performs the basic operations to identify and process correctly short constituents (1–6 words).
Releasing the resulting stable system as apertium-eng-kaz and disseminate it to the
interested parties to obtain feedback about its functioning. We can reasonably expect this system to
work better than the existing commercial systems in most aspects.
As a longer-range objective, and when a reasonably complete prototype is available, we will
tackle another interesting goal: the use of feedback from human input (for instance, in an interactive
machine translation system that provides completions to what the translator is typing).
Table 7: Example machine translation output for some simple phrases and examples.
English
Kazakh (Apertium)
Kazakh (Sanasoft)
Kazakh (Trident)
I see the blue sky.
Мен көк аспанды
көремін
Менде көк аспан көріп
жатырмын
Мен көгілдір аспанды
көремін.
You
go to school
Сіз
мектепке
барасыз
Сіз мектепке бардың
сендер
үйрету
барасыңдар
A book has been
given to you
кітап
сізге
беріліп болған
Кітап барып жатыр
сізге берсін
Кітап
жібер- сендерге
болды
I can go to the
three big shop
Мен үш үлкен
дүкенге
бара
аламын
Мен
three
үлкен
магазинге
болып
жатырмын жүрмін
Мен үшке деген бару
үлкен дүкен білемін
The most beautiful
of
garden
is
opened
бақшаның
ең
әдемісі ашылады
Көпшілік әдемі бақ ашық
бар
Ең тамаша бақшадан
болады ашыл-
I see my car
Мен менің жеңіл
автокөлігімді
көремін
Мен менің автомобилім
көріп жатырмын
Мен
өзінің
автомобильсын көремін
The famous doctor
of the city is going
to hospital
қаланың танымал
дәрігері емханаға
барып жатыр
Атақты дәрігер қала
ауруханаға барады
қаланың атайы докторы
ауруханаға
деген
жиналады
She
eats
chocolates
with
sugar
Ол
шоколадтарды
қантпен жейді
Ол eats chocolates қант
Ол
шоколадтарды
қантпен жейді
1st>1st>1st>1st>
Достарыңызбен бөлісу: