Speech corpus
The main data for our acoustic modeling and speech recognition experiments is KazBNT
acoustic corpus (database). The KazBNT corpus consists of two independent sub-corpora –
KazSpeechDB and KazMedia.
The KazSpeechDB corpus as part of Kazakh Language Corpus [5] is a body of utterances
consisting of 12675 Kazakh sentences recorded in a sound recording studio, uttered by speakers of
different age and gender, from different regions of Kazakhstan. The corpus contains 22 hours of
speech; its sampling rate is 16 kHz. The total number of speakers is 169, 73 of which are men and
96 are women. Each speaker uttered 75 sentences. Every audio file is supplied with a text file that
contains transcription text of the utterance.
The KazMedia corpus is a body of text and audio data collected from official websites of
broadcast news channels “Khabar” [6], “Astana TV” [7] and “Channel 31” [8]. The text data is a
«ҚОҒАМДЫ АҚПАРАТТАНДЫРУ» V ХАЛЫҚАРАЛЫҚ ҒЫЛЫМИ-ПРАКТИКАЛЫҚ КОНФЕРЕНЦИЯ
49
collection of all Kazakh news in plain text, published on the official websites of these 3 media
channels from 2013 to 2015. The audio data is 518 wav-files, which are actually audio tracks
extracted from a number of video news in Kazakh. The total duration of these audio files is 11
hours of speech; the sampling rate is 16 kHz. Every wav-file is supplied with a txt-file that contains
detailed transcription text of the news and an time-aligned annotation file with labels about speaker
gender, language and noise.
It is worth to mention that, in fact, the KazMedia corpus contains more than 400 hours of
audio news in Kazakh published from 2013 to 2015. However this data has initially got no
orthographic transcriptions or other accompanying annotations. Therefore, we have preprocessed
only a certain number of video news for our preliminary experiments: as stated above, it makes 11
hours of Kazakh speech in total.
Preparations of the experiment
The dictionary and the language model of the KazBNT system were formed on the basis of
cumulative text data of both KazSpeechDB and KazMedia sub-corpora. We used the IRSTLM
Toolkit [9] for language modeling.
A train set, a validation set and 3 independent test sets of the KazBNT system were formed
on the basis of audio data from the KazSpeechDB and KazMedia sub-corpora, as described in Table
1. Then there were carried out a series of interdependent acoustic modeling experiments with this
audio data and attendant txt-files. We used the Kaldi speech recognition toolkit [10] for acoustic
modeling. The experiments started with training a simple monophone model, and ended with
training a deep neural network. It should be noted that every next experiment is based on the
previous one’s result, and generally refines upon it.
Table 1
List and characteristics of experiment sets
Set type
Set name
Number and source of files in the set
Total duration of
audio data
Train set
kazbnt.train
11175 wav-files from KazSpeechDB +
406 wav-files from KazMedia
29 hours
Validation set
kazbnt.dev
750 wav-files from KazSpeechDB + 49
wav-files from KazMedia
2.4 hours
Test set 1:
Khabar
kazbnt.test_khabar
30 wav-files of audio news
from the «Khabar» channel
20 minutes
Test set 2:
Astana TV
kazbnt.test_astanatv
14 wav-files of audio news
from the «Astana TV» channel
20 minutes
Test set 3:
Channel 31
kazbnt.test_channel31
19 wav-files of audio news
from the «Channel 31»
20 minutes
Experimental results
Experiment 1 (Monophones: Delta-Deltas) – a monophone model using delta-delta features
and cepstral mean and variance normalization on a per-speaker basis.
Experiment 2 (Triphones: LDA + MLLT + SAT) – a triphone model using linear
discriminant analysis, maximum likelihood linear transform, and speaker adaptive training.
Experiment 3 (DNN1) – a deep neural network with 2 hidden layers each having 300
neurons.
Experiment 4 (DNN2) – a deep neural network with 4 hidden layers each having 2000
neurons.
«ҚОҒАМДЫ АҚПАРАТТАНДЫРУ» V ХАЛЫҚАРАЛЫҚ ҒЫЛЫМИ-ПРАКТИКАЛЫҚ КОНФЕРЕНЦИЯ
50
A common metric to evaluate the performance of speech recognition models is WER (word
error rate), which is computed as the ratio of erroneously recognized words to the total number of
words in the reference text. The lower the WER, the better the accuracy of the recognition system
is.
The summary of the experimental results for all available sets are shown in Table 2.
Table 2
Minimum value of WER on the train, validation and test sets
Experiment \ Set
kazbnt.
train
kazbnt.
dev
kazbnt.
test_khabar
kazbnt.
test_astanatv
kazbnt.
test_channel31
Monophones
9.50 %
9.84 %
14.56 %
18.75 %
29.71 %
Triphones
5.70 %
6.32 %
6.36 %
9.88 %
17.13 %
DNN1
5.15 %
5.38 %
5.44 %
8.68 %
17.25 %
DNN2
3.86 %
4.54 %
4.06 %
7.52 %
14.54 %
Bold font indicates the best WER results. In all cases the best WER was achieved by using
the DNN2 acoustic model. There is a marked difference between the results for the “Khabar”
channel (WER 4.06%), “Astana TV” (WER 7.52%) and “Channel 31” (WER 14.54%). It can be
seemingly explained by different quality of the audio data, in terms of background noise and
interfering sounds.
The obtained results are commensurable with similar results for other languages. For example, for
Arabic the WER is 8.61% (KACST v1.10 [1]), for English the WER is 11.6% (CU-HTK 2006 [2]), for
Mandarin Chinese the CER is 15.9% (based on LIMSI [3]).
It should be mentioned that in spite of the fact that the DNN2 model shows the best results,
it is still very slow in action. This is due to its large size and high resource intensity, which makes
the model require a great deal of time to load into the RAM and
initialize
itself. To solve this
problem we shall need to take certain actions on the
optimization
of the model loading at the system
level.
Conclusion and Future Work
In this work we presented a baseline Kazakh broadcast news transcription system built on
Kaldi platform which demonstrates quite tolerable recognition accuracy when using deep neural
networks. Also it is worth mentioning that we have collected and prepared speech data containing
real TV news which we used for acoustic modelling.
Although the results are promising, there are several directions to improve the system
performance in terms of recognition accuracy. These are segmentation and clustering of speech data
into homogeneous intervals. Another important issue to address is the speed of speech recognition.
References:
1. Mansour Alghamdi, Moustafa Elshafei, Husni Al-Muhtaseb. “Arabic broadcast news transcription
system”. International Journal of Speech Technology, Volume 10, Issue 4, pp. 183–195.
2. M. J. F. Gales, Do Yeong Kim; P. C. Woodland; Ho Yin Chan; D. Mrva; R. Sinha; S. E. Tranter.
"Progress in the CU-HTK broadcast news transcription system". In the IEEE Transactions on Audio, Speech,
and Language Processing, vol. 14, no. 5, September 2006, pp. 1513–1525.
3. R. Sinha, M.J.F. Gales, D.Y. Kim, X.A. Liu, K.C. Sim, P.C. Woodland. “The CU-HTK Mandarin
broadcast news transcription system”. In the Proc. ICASSP, 2006. IV 1280.
4. Jean-luc Gauvain , Lori Lamel , Gilles Adda. “The LIMSI Broadcast News Transcription System”.
Speech Communication, vol. 37, iss. 1–2, pp. 89–108.
5. O. Makhambetov, A. Makazhanov, Zh. Yessenbayev, B. Matkarimov, I. Sabyrgaliyev, and A.
Sharafudinov. 2013. “Assembling the Kazakh Language Corpus”. In Proceedings of the 2013 Conference on
«ҚОҒАМДЫ АҚПАРАТТАНДЫРУ» V ХАЛЫҚАРАЛЫҚ ҒЫЛЫМИ-ПРАКТИКАЛЫҚ КОНФЕРЕНЦИЯ
51
Empirical Methods in Natural Language Processing, pp. 1022–1031. Association for Computational
Linguistics.
6. “Khabar” TV channel, official site. URL: khabar.kz [Access date: 18.04.2016].
7. “Astana TV” channel, official site. URL: astanatv.kz [Access date: 18.04.2016].
8. “Channel 31” TV channel, official site. URL: 31.kz [Access date: 18.04.2016].
9. IRSTLM Toolkit version 5.80.08. URL: https://sourceforge.net/projects/irstlm/ [Access date:
18.04.2016]
10. Povey, Daniel and Ghoshal, Arnab and Boulianne, Gilles and Burget, Lukas and Glembek,
Ondrej and Goel, Nagendra and Hannemann, Mirko and Motlicek, Petr and Qian, Yanmin and Schwarz, Petr
and Silovsky, Jan and Stemmer, Georg and Vesely, Karel. “The Kaldi Speech Recognition Toolkit”. IEEE
2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, 2011.
UDC 681.5
YUNICHEVA N.R., BEREKE M.B.
BUILDING SOLUTIONS SET OF THE SOLUTIONS OF THE INTERVAL ALGEBRAIC
EQUATIONS SYSTEM IN THE PROBLEM OF OBJECT CONTROL
SYSTEMS SYNTHESIS WITH INACCURATE DATA
(Institute of Information and Computational Technologies,
Al-Farabi Kazakh National University, Almaty, Kazakhstan)
The article presents the procedure for solving the task of parametric control synthesis, which is
brought to the resolvability of interval algebraic equations. Solution for the obtained system has been found
in the class of "controlled" solutions.
It was stated several times that the real technical objects function under conditions of parametric
uncertainty. Such uncertainty is resulted from the presence of uncontrolled disturbances which affect the
control objects, because of not lack of knowledge of true parameter values of control objects due to the
complexity of the process, and sometimes their unpredictable variation in time. In almost all cases, the
above-mentioned parametric uncertainty is characterized by belonging real parameter values of the technical
object to some intervals, the limits of which are known on a priori basis. Their mathematical models can be
represented by systems of integral differential and difference equations with the use of rules and designations
of interval analysis [1], and the class of such control objects is commonly known as interval-based.
Thus, we face the problem of control of not only the subject, but a family or set of objects.
It has been noticed that the formulated problem brought to resolvability of the system of such linear
interval algebraic inclusions [2]:
H
K
P
, (1)
Meaning of the term "solutions" of interval system of inclusions of type (1) requires a special
clarification, as interval uncertainty of the system data can be interpreted in two ways, in accordance with the
dual understanding of intervals themselves. In the first case, interval
x x
,
is a set of all real numbers from
x
to
x
, and in the second case it holds even a single meaning between
x
and
x
. In mathematical terms,
this difference is expressed by use of universal quantifiers
and existential quantifier
: in the fist case it
is recorded
x
x x
,
, and in the second case
x
x x
,
. As for the parameters of the system of linear
interval equations
p
ij
, known only with their belonging to some intervals, the vital difference between two
types of interval uncertainty manifests as the difference between the parameters that can be changed within
the indicated intervals as a result of external unpredictable disturbances and parameters which we can
willfully vary within the set intervals, i.e., control them.
«ҚОҒАМДЫ АҚПАРАТТАНДЫРУ» V ХАЛЫҚАРАЛЫҚ ҒЫЛЫМИ-ПРАКТИКАЛЫҚ КОНФЕРЕНЦИЯ
52
There are the following
various definitions of solutions of the interval system of algebraic
interval equations [1] in the interval analysis:
Joint set of solutions,
}
H
PK
H
H
P
P
|
R
K
H
,
P
n
, (2)
which is formed by solutions of all systems
PK
H
with
P
P
and
H
H
.
The problem of building the set of type (2) is commonly known as the identification
problem.
Allowable set of solutions
H
PK
H
H
P
P
|
n
R
K
H
,
P
, (3)
which is formed by all such vectors
K
R
n
, so that the product
PK
falls into
H for any
P
P
.
The problem of building the set of type (3) is commonly knows as the linear tolerance
problem.
Controlled set of solutions
H
PK
P
P
H
H
|
R
K
H
,
P
n
, (4)
which is formed by such vectors
K
R
n
, so that the relevant satisfying
H
H
can be
selected
P
P
for any desired
PK
H
.
The problem of building the set of type (4) is the control problem.
In the work presented here, the problem of parametric synthesis of control, by analogy with
[2,3] is brought to resolvability of the system of interval algebraic inclusions.
The problem of finding a solution of the obtained system is challenging NP. To simplify the
problem and computational complexity, we can select the point vector (or mean vector) from the
interval vector of adjustable parameters and use it as an initial approximation. The solution can be
sought in the class of "controlled" solutions.
In 1992, Shary S.P. introduced the concept of "controlled solutions". This name is explained
by the fact that each vector
H
H
can be reached by the product
PK
as a result of appropriate
control or adjustment of matrix coefficients
P
within
P
.
Vector is
K
R
n
called as the controller of the system solution
PK
H
, provided that
each
H
H
has a matrix
P
P
, so that
PK
H
or
P
H
P
|
PK
We will use the evidence presented in [4]. The following mathematical formulation is true
for the controlled solutions.
Let us assume that
K
R
n
the controlled solution of the system
PK
H
, then
K
R
n
satisfies inequality.
K
H
K
P
c
c
,
where
)
(
2
/
1
h
h
– is nonnegative vector of radiuses.
«ҚОҒАМДЫ АҚПАРАТТАНДЫРУ» V ХАЛЫҚАРАЛЫҚ ҒЫЛЫМИ-ПРАКТИКАЛЫҚ КОНФЕРЕНЦИЯ
53
Evidence: if the vector
K
R
n
is a controlled solution, it satisfies the inclusion which
results in
K
K
P
δ
H
δ
H
K
K
P
c
c
c
c
and
K
H
K
P
K
c
c
.
It follows that,
K
H
K
P
c
c
Thus, the obtained controlled solutions can be used as an initial approximation when
building the interval vector of adjustable parameters.
References:
1. Sharyi S.P. Linear static systems with interval uncertainty: Effective algorithms for solving the
problems of control and stabilization// Computational technologies, 1995. V 4. P. 331-356
2. Yunicheva N.R. Questions of the analysis and synthesis of control systems by objects in
uncertainty conditions. Almaty, Printing house «Сlassics». 2011. – 95p.
3. Khlebalin N.A. Modal Control of Plants with Uncertain Interval Parameters, in: Proc. Intern.
Workshop «Control System Syntesis: Theory and Application», Novosibirsk, 1991. - P. 168-173.
4. Jolene L., Kiefer M., Deidre O., Walter E. Application interval analysis. М.: Institute for Computing
Research. 2007. - 467p.
УДК 004.822:514
АЛХАНОВ А.А., ОМАРБЕКОВА А.С.
ГЕНЕРАЦИЯ ТЕСТОВОГО ЗАДАНИЯ ИЗ RDF ФАЙЛА С ПОМОЩЬЮ PYTHON И
SPARQL
(Евразийский национальный университет им. Л.Н. Гумилева, г. Астана, Казахстан)
В данной статье рассматривается разработка системы для автоматической генерации
вопросов на основе базы знании. Эта работа имеет прикладной характер, предоставляет
примеры разработки онтологии и выполнение запросов в RDF документы. Также
отображается работа на языке программирования Python вместе с подключением
необходимых библиотек для работы с RDF файлами и с онтологиями сохраненными в
других форматах. В статье широко раскрывается понятие RDF документ и его назначение.
Немаловажное внимание уделено таким понятиям как триплы, утверждения, субъекты,
предикаты и объекты.
В качестве базы знании выступает онтология терминов по информатике сохраненная в
формате RDF. Resource Description Framework, т.е. RDF, это основа для описания ресурсов в
сети. RDF не единственный формат для сохранения онтологии, есть и другие форматы к
примеру Turtle, который дает более понятный для чтения человеком документ, еще один
пример Ontology Web Language(OWL) формат, который дает онтологии более расширенные
свойства делать логические заключения.
|