Атты І халықаралық конференция ЕҢбектері

жүктеу/скачать 8,57 Mb.

Pdf көрінісі

бет	236/326
Дата	07.01.2022
өлшемі	8,57 Mb.
	#19269

1 ... 232 233 234 235 236 237 238 239 ... 326

Байланысты:
Болатбек М. (1)

Language Modeling
As for the language model, here we used our text materials to create a standard tri-grams based
model  with  Good-Turing  smoothing  [15]  compiled  into  ARPA  format  by  CMU-Cambridge
Language Model Toolkit 0.7 [16]. The format of language model file is as follows:
\data\
ngram 1=nr # number of 1-grams
ngram 2=nr    # number of 2-grams
ngram 3=nr # number of 3-grams

\1-grams:
p_1 wd_1 bo_wt_1
\2-grams:
p_2 wd_1 wd_2 bo_wt_2
\3-grams:
p_3 wd_1 wd_2 wd_3
\end\
where ngram k – is the number of the corresponding n-grams, p_k - the logarithm (base 10) of
conditional  probability p of  an  n-gram,  wd_k  –  a  word  in  n-gram,  and  bo_wt_k  -   the  logarithm
(base 10) of the backoff weight for the n-gram.
For  our  experiments  we  have  totally  over  12500  sentences,  which  produce  29586  unigrams,
100354 bi-grams and 120755 tri-grams.

жүктеу/скачать 8,57 Mb.

Достарыңызбен бөлісу:

1 ... 232 233 234 235 236 237 238 239 ... 326