Issn 2518-198х индексі 74623 Индекс 74623 Қ ар ағ анд ы уни вер си тетiнiң ÕÀÁÀÐØÛÑÛ


The relationship between the frequency and length of words



Pdf көрінісі
бет15/46
Дата12.10.2022
өлшемі5,53 Mb.
#42591
1   ...   11   12   13   14   15   16   17   18   ...   46
Байланысты:
аааа сейдімбек

The relationship between the frequency and length of words 
No. 
Length
frequency 
№ 
Length
frequency 
№ 
Length
frequency 
№ 
Length 
frequency 

2.0054 

13.4878 

8.9305 
13 
1.6849 

2.8539 

13.8508 
10 
5.7958 
23 
0.0355 

7.5870 

13.1275 
11 
5.0928 
26 
0.0257 

8.8262 

10.9275 
12 
2.8579 
30 
0.0146 
As shown in Table 3 proportion of three to nine letters words among all Kazakh words is 76.7373 %. 
The major part is five to seven letters words, which is 40.4661 %. Figure 4 is the relationship between length 
and frequency. It indicates three to nine letters words is the majority. And they are followed by ten to eleven 
letters words, which is 10.8886 %. One to two and twelve letter words have a proportion of 7.7172 %. Last 
part is thirteen to thirty letters words, which is 4.9989 %.
This experiment shows relationship between length and frequency, which prove that frequency of 
Kazakh words follows Zipf’s power law. This result also indicates three to nine letters words are most words 
in Kazakh version of Xinjiang daily news. There are also some long words, which matches readers’ reading 
habit and vocabulary level.


А corpus-based frequency statistic… 
Серия «Филология». № 2(86)/2017 
19 
Figure 4.The relationship between the length and word frequency 
The relationship between the length and frequency data from textbooks corpus in Figure 4. As about 
that the relationship between length and frequency of all word has a remarkable characteristic is to the left, 
which means most of the Kazakh word length is short, E.g. On the one hand, The length from 5 to 8 account-
ing is 51.3936 % of all words, from 13 to 30 accounting is only 4.9289 % of all words, on the other hand, 
Dragging a long «tail» is the another big characteristic of power law. 
Statistical analysis of Kazakh words, stems, suffixes: We use the second experimental data for the 
following Kazakh words, stems, suffixes statistical analysis of Kazakh words in primary school, junior high 
school and high school.In order to explore Statistical analysis of Kazakh words,here word,stem and suffix 
are a kind of word segmentation unit in our corpus for non recurring terms, which is, the number of Kazakh 
word, stem and suffix except the punctuation mark and the English language, as shown in Figure 5. 
 
Figure 5. The relationship between the word of length and frequency in Kazakh textbooks 
Figure 5 shows that words in three different taxtbooks are composed of 1 to 20 charaters.The Words 
composed of less than 3 or more than 15 charaters are used less commonly. In primary school, junior high 
school, and high school textbooks, word composed of less than 3 or more than 15 characters account for 
2.07 % and 0.7 %, 1.91 % and 0.67 %, and 1.69 % and 0.9 % respectively. 
In order to explore Statistical analysis of Kazakh stem for non recurring terms,we analyze the length of 
stem in primary school, junior high school and high school, as shown in Figure 6. 
Figure 6. The relationship between the stem of length and frequency in Kazakh textbooks
哈萨克语词频与长度关系
0
5
10
15
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
哈萨克语词长
哈萨克语词频百分

(%

词频
0
2000
4000
6000
8000
10000
12000
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 >20
词长



小学
初中
高中
0
500
1000
1500
2000
2500
3000
3500
4000
1
2
3
4
5
6
7
8
9
10
11
12-
17
词干长




小学
初中
高中


G. Altenbek, X.L. Wang 
20 
Вестник Карагандинского университета 
Figure 6 shows that commonly used word stems in three different taxtbooks are composed of 4 to 
8 charaters. 96.67 % (1505) of the word stems are composed of 4 to 8 charaters in primary school Kazakh 
language textbook. 94.84 % (11,167) of the word stems are composed of 3 to 9 charaters in junior high 
school Kazakh language textbook. 93.13 % (14,976) of the word stems are composed of 3 to 9 charaters in 
high school Kazakh language textbook. Word stems composed of less than 3 or more than 9 charaters are 
used less commonly.
This study summarizes the length of the word suffix used in three different textbooks for non recurring 
terms as shown in Figure 7.
Figure 7. The relationship between the length and frequency of ending in Kazakh textbook 
Figure 7 shows that the common suffix for words used in the three textbooks are usually composed of 1 
to 5 characters. 95.49 %, 94.97% and 93.92 % of the suffix types appear in primary school, junior high 
school, and high school textbooks, respectively. There is little difference among the number of suffix types 
used in textbooks for different levels.
Figure 7 shows the similarity among the distribution of the endings in the words used in three different 
Kazakh textbooks, which indicates the characteristics of closed endings of Kazakh words, and the relatively 
stable word choice in Kazakh language. Finally, the analysis on the longest Kazakh words searched from the 
three Kazakh textbooks demonstrate that there tends to be little difference among the length of the longest 
words, as shown in Table 4. 
T a b l e 4


Достарыңызбен бөлісу:
1   ...   11   12   13   14   15   16   17   18   ...   46




©emirsaba.org 2024
әкімшілігінің қараңыз

    Басты бет