Атты І халықаралық конференция ЕҢбектері

жүктеу/скачать 8,57 Mb.

Pdf көрінісі

бет	59/326
Дата	07.01.2022
өлшемі	8,57 Mb.
	#19269

1 ... 55 56 57 58 59 60 61 62 ... 326

Байланысты:
Болатбек М. (1)

9.2 Experiment results
Table 3.phrase indentify test
10 Conclusion
Acknowledgments This work is supported by National Natural Science Foundation of China（NSFC） under Grant No. 61063025. Reference

8 Kazakh phrase system
Kazakh  phrase  recognition  system  consists  of  four  modules,  for  example,  training  module,
identification module, test module and auxiliary module. By following a comprehensive analysis of
Kazakh words, the following is the Kazakh shallow parsing process:
（1）Sentence：
.ىتپىرۇت پىعوس ىلەج ڭىدزۇك ناعانىزا ،ەسلەك پىعاب يوق ،ەگرەج لوس رابماق ،ىتپىلەك زۇك رىڭوق

（2） POS:
رىڭوق
n/

زۇك
n/
ىتپىلەك
v/
رابماق ،
n/

لوس
pron/

ەگرەج
n/
يوق ،
n/

پىعاب
v/

ەسلەك
v/
ناعانىزا ،
adj/
ڭىدزۇك
n/
ىلەج

n/
پىعوس
v/

ىتپىرۇت
v/
.

（3）Phrase POS:
]]
رىڭوق
n/

زۇك
  NP[n/
ىتپىلەك
[v/
،
VP

]]
رابماق
n/

]
لوس
pron/

ەگرەج
n/
،
NP[RP[

]]
يوق
  n/

پىعاب
[  v/VP

ەسلەك
[[v/
،
VP

]]]]
ناعانىزا
adj/

]
ڭىدزۇك
n/
ىلەج
AP[NP[n/
پىعوس
VP[v/

ىتپىرۇت
[v/
.
VP

（4）Tree bank:

54

Fig.3 Kazakh Tree Bank

Fig.4 Kazakh verb phrase identify system

9 . Experiment results and analysis
9.1 Dataset
In this paper, as the data set we are using is the data of 31 days of January 2008 of the Xinjiang
Daily corpus. The corpus consists of the raw texts and the POS tagged XML format texts.
Experiments were done for phrase extraction .

Fig. 5 Verb phrase Annotated corpus

9.2 Experiment results
The experiments of the accuracy rates are evaluated using as follow standard evaluation
measures:

55

recall=a/(a+b)*100%；
precision= a/(a+c)*100%；
leakage=b/(a+b)*100%；
error=c/(a+c)*100%；
Note:  recall  +leakage=1；precision  +error=1；a  is  number  of  correctly  identified  phrases.  b  is
number of missed phrases. c is number of wrongly identified phrases.
In the test corpus, there are 3000 correct tagged sentences as training data for the close test, and
other 1000 sentences for the open test.
Table 3.phrase indentify test
meth
od
Test type  precision
（%）
recall（
%）
error
（%）
leakage
（%）
rule
Close
test
81.58
72.51
18.42
27.49
rule
Open test  78.22
70.01
21.78
29.99
ME
Close
test
91.62
87.33
8.81
15.67
ME
Open test  87.89
83.13
12.11
16.87
10 Conclusion
This paper identified Kazakh  phrases based on rules and the maximum entropy method. It used
the  Kazakh  word,  part  of  speech,  affixes  context  information  to  design  template  of  features  by
maximum entropy model. GIS algorithm was investigated to the feature set of parameter estimation,
and  the final  output of the optimal recognition  results  of the phrase. Based on statistical  methods,
we can obtain higher accuracy in the close test, but were unable to get a good result in the open test,
which requires training more and more corpora.
Acknowledgments
This work is supported by National Natural Science Foundation of China（NSFC） under Grant
No. 61063025.

Reference
[1]  Church  K.A  stochastic  parts  program  and  noun  phrase  parser  for  unrestricted  text[J].  In
Proceedings  of  the  Second  Conference  on  Applied  Natural  Language  Processing.  Texas,  USA.
1988,19(8):136-143.
[2]  Steven  Abney.  Parsing  by  chunks[M].  Dordrecht:  Kluwer  Academic  Publishers,1991:257-
278
[3]Rob Koeling. Chunking with Maximum Entropy Models[J]. Proceedings of CoNLL-2000 and
LLL-2000,2000,109(15):139-141
[4]  Zhao  Jun  and  Huang  Changning,.  Chinese  basic  noun  phrase  structure  analysis  model,
Computer sinence[J].,1999，22(2):141-146．
[5]Qiang Zhou,2004,Annotatiion scheme for Chinese Treebank, Journal of Chinese Information
Processing, Vol 18(4),Pages 1-8.
[6]  Gulila.Altenbek,Ruina-Sun,Kazakh  Noun  Phrase  Extraction  based  on  N-gram  and
Rules,2010
International
Conference
on
Asian
Language
Processing
(IALP2010),Harbin,China,2010, Pages 305-308.
[7]  Gulila,  A.  and  Dawel,A.  and  Muheyat,N.(2009).A  Study  of  Word  Tagging  Corpus  for  the
Modern Kazakh Language, Journal of Xinjiang University[J]., 26(4), Pages 394-401.
[8]  Adam  Berger,  Stephen  Della  Pietra,  and  Vincent  Della  Pietra(1996),A  Maximum  Entropy
Approach to Natural Language ,Processing Computational Linguistics, 22(1), Pages 39-71.
[9]Adwait  Ratnaparkhi.  Learning  to  parse  natural  language  with  maximum  entropy
models[J].Machine Learning,1999,341(3):151-176
[10]Porter,M.F.(1980)..An algorithm for suffix stripping, Program, 14(3)：130−137.

56

[11]Karttunen,Lauri(1983).  KIMMO:  A  general  morphological  processor.  Texas  Linguistic
Forum, 22:163–186.
[12]Gülşen,E.  and  Eşref,A.(2004).An  affix  stripping  morphological  analyzer  for  Turkish,
Proceedings  of  the  International  Conference  on  Artificial  Intelligence  and  Application,  Austria,
299-304.
[13]Kemal Oflazer(1994).Two-level description of Turkish morphology. Literary and Linguistic
Computing,9(2):137-148.
[14]Beesley,K.R.(1996).Arabic finite-state morphological analysis and generation. In COLING-
96, Copenhagen,pages 89-94.
[15]Milat,A.(2003).Modern Kazakh language, Xinjiang People's press, China.
[16]Dingjing  Zhong.  Practical  Grammar  of  Modern  Kazakh  Language.  Beijing:  Central
University for Nationalities Press,2004.

Attachment 1 :

жүктеу/скачать 8,57 Mb.

Достарыңызбен бөлісу:

1 ... 55 56 57 58 59 60 61 62 ... 326