Атты І халықаралық конференция ЕҢбектері

жүктеу/скачать 8,57 Mb.

Pdf көрінісі

бет	232/326
Дата	07.01.2022
өлшемі	8,57 Mb.
	#19269

1 ... 228 229 230 231 232 233 234 235 ... 326

Text materials

Acoustic Database
Most of the modern speech processing systems requires large amount of audio and text data for
training the acoustic and language models. Depending on the type of an application data needed
varies from high quality microphone read speech (WSJ0 [1]) to conversational telephone speech
(Switchboard [2] or CALLHOME [3]), from continuous speech (TIMIT [4]) to connected
(TIDIGITS [5]) and isolated words (PhoneBook [6]). In our current work, we collected a corpus of
28 hours high quality microphone read Kazakh speech of 169 native speakers for the large
vocabulary continuous speech recognition tasks. The acoustic database is initiated as a part of the
Kazakh Language Corpus compiled in [7].
Text materials
The text materials to be uttered were carefully selected from the primary text corpus and divided
into two parts: short sentences and stories.
The  “sentences”  part  has  more  than  12K  different  sentences  randomly  and  equally  extracted
from  the  five  stylistic  genres  mentioned  above.  The  sentences  are  chosen  so  that  they  have  more
than 120K words contained in the list of the most frequent words covering the 95% of all the texts
in the primary corpus. Additionally, the sentences were grouped according to their length in words.
Thus, we have 10 groups of sentences having the lengths from 6 to 15 words in each.
The  “stories”  part  contains  the  short  online  news  extracted  from  massmedia  section  of  the
primary text corpus. Each story has not more than 300 words.
All the text materials were subdivided into numbered small and nonintersecting sets to be uttered
by the speakers. A standard set for one speaker has exactly 75 sentences (by 10 sentences from five
shorter groups and by 5 sentences from five longer groups) and 1 story.
Speakers
The speakers that took part in the recordings are volunteers recruited by advertisements in the
local newspapers and personal referral. The main criteria of speaker selection were a region where
he/she learned Kazakh or spent most of his/her life, age, gender and the ability to read Kazakh.
The first criterion helped to capture variability present in speech due to the speakers’ settlement
both  local  and  external.  Totally  there  are  15  region  groups:    14  official  regions  (“oblast”)  of
Kazakhstan and one group for those who lived outside of the country.
The speakers are divided into four age groups not including children and school students:


I group – 18-27 years;


II group – 28-37 years;


III group – 38-47 years;


IV group – 48 years and above.
We  did  not  strictly  balance  the  speakers  by  their  gender  due  to  the  difficulties  in  finding  the
volunteers but still tried to keep the number of speakers of one gender per profile not more than 3.
The female and male distributions are 57% and 43%, respectively.
The other important criterion was the ability to read Kazakh since not all the interviewees could
read  in  Kazakh  sufficiently  fluent,  what  is  a  common  issue  in  a  bilingual  country  such  as
Kazakhstan.  Additionally  we  kept  the  records  of  the  speakers’  education  whether  they  graduated
last from school, college or university.
Totally,  we  recorded  169  speakers.  The  following  Table  1  presents  the  distribution  of  the
speakers across the regions, gender and age groups. The blank spots show the speaker profiles that
we could not recruit. Mostly, these correspond to the distant regions and elder male groups.

233

жүктеу/скачать 8,57 Mb.

Достарыңызбен бөлісу:

1 ... 228 229 230 231 232 233 234 235 ... 326