Introduction Speech recognition is a process of automatic conversion of human speech into corresponding
text. Modern automatic speech recognition systems (ASR) advanced significantly from simple
speaker-dependent word recognition to speaker-independent large vocabulary continuous speech
recognition for broadcast news and telephone conversation transcriptions. Despite of widespread
use of such systems in daily life, most of them are concerned with the languages like English,
German, Japan, Russian, etc. As for Kazakh language, it is still underrepresented in speech
recognition research. Thus, the primary goal of this work is to build a baseline large vocabulary
continuous speech recognition system.
Fig. 1 outlines a standard architecture of a modern ASR system, which includes feature
extraction and pre-processing, acoustic and language modeling, system combination and decoding.
First step to build such a system for Kazakh would be collecting enough audio data, and creating the
acoustic and language models. This is exactly the way we approach the problem.
This paper presents an acoustic database of Kazakh speech in Section 2, the experiments and
conclusions are given in Sections 3 and 4, respectively.