ббк76. 0 Қ 54 Редакционная коллегия

жүктеу/скачать 14,62 Mb.

Pdf көрінісі

бет	8/57
Дата	03.03.2017
өлшемі	14,62 Mb.
	#5946

1 ... 4 5 6 7 8 9 10 11 ... 57

Speech corpus

The main data for our acoustic modeling and speech recognition experiments is KazBNT

acoustic corpus (database). The KazBNT corpus consists of two independent sub-corpora –

KazSpeechDB and KazMedia.

The KazSpeechDB corpus as part of Kazakh Language Corpus [5] is a body of utterances

consisting of 12675 Kazakh sentences recorded in a sound recording studio, uttered by speakers of

different age and gender, from different regions of Kazakhstan. The corpus contains 22 hours of

speech; its sampling rate is 16 kHz. The total number of speakers is 169, 73 of which are men and

96 are women. Each speaker uttered 75 sentences. Every audio file is supplied with a text file that

contains transcription text of the utterance.

The KazMedia corpus is a body of text and audio data collected from official websites of

broadcast news channels “Khabar” [6], “Astana TV” [7] and “Channel 31” [8]. The text data is a

«ҚОҒАМДЫ АҚПАРАТТАНДЫРУ» V ХАЛЫҚАРАЛЫҚ ҒЫЛЫМИ-ПРАКТИКАЛЫҚ КОНФЕРЕНЦИЯ

collection of all Kazakh news in plain text, published on the official websites of these 3 media

channels from 2013 to 2015. The audio data is 518 wav-files, which are actually audio tracks

extracted from a number of video news in Kazakh. The total duration of these audio files is 11

hours of speech; the sampling rate is 16 kHz. Every wav-file is supplied with a txt-file that contains

detailed transcription text of the news and an time-aligned annotation file with labels about speaker

gender, language and noise.

It is worth to mention that, in fact, the KazMedia corpus contains more than 400 hours of

audio news in Kazakh published from 2013 to 2015. However this data has initially got no

orthographic transcriptions or other accompanying annotations. Therefore, we have preprocessed

only a certain number of video news for our preliminary experiments: as stated above, it makes 11

hours of Kazakh speech in total.

Preparations of the experiment

The dictionary and the language model of the KazBNT system were formed on the basis of

cumulative text data of both KazSpeechDB and KazMedia sub-corpora. We used the IRSTLM

Toolkit [9] for language modeling.

A train set, a validation set and 3 independent test sets of the KazBNT system were formed

on the basis of audio data from the KazSpeechDB and KazMedia sub-corpora, as described in Table

1. Then there were carried out a series of interdependent acoustic modeling experiments with this

audio data and attendant txt-files. We used the Kaldi speech recognition toolkit [10] for acoustic

modeling. The experiments started with training a simple monophone model, and ended with

training a deep neural network. It should be noted that every next experiment is based on the

previous one’s result, and generally refines upon it.

Table 1

List and characteristics of experiment sets

Set type

Set name

Number and source of files in the set

Total duration of

audio data

Train set

kazbnt.train

11175 wav-files from KazSpeechDB +

406 wav-files from KazMedia

29 hours

Validation set

kazbnt.dev

750 wav-files from KazSpeechDB + 49

wav-files from KazMedia

2.4 hours

Test set 1:

Khabar

kazbnt.test_khabar

30 wav-files of audio news

from the «Khabar» channel

20 minutes

Test set 2:

Astana TV

kazbnt.test_astanatv

14 wav-files of audio news

from the «Astana TV» channel

20 minutes

Test set 3:

Channel 31

kazbnt.test_channel31

19 wav-files of audio news

from the «Channel 31»

20 minutes

Experimental results

Experiment 1 (Monophones: Delta-Deltas) – a monophone model using delta-delta features

and cepstral mean and variance normalization on a per-speaker basis.

Experiment 2 (Triphones: LDA + MLLT + SAT) – a triphone model using linear

discriminant analysis, maximum likelihood linear transform, and speaker adaptive training.

Experiment 3 (DNN1) – a deep neural network with 2 hidden layers each having 300

neurons.

Experiment 4 (DNN2) – a deep neural network with 4 hidden layers each having 2000

neurons.

«ҚОҒАМДЫ АҚПАРАТТАНДЫРУ» V ХАЛЫҚАРАЛЫҚ ҒЫЛЫМИ-ПРАКТИКАЛЫҚ КОНФЕРЕНЦИЯ

A common metric to evaluate the performance of speech recognition models is WER (word

error rate), which is computed as the ratio of erroneously recognized words to the total number of

words in the reference text. The lower the WER, the better the accuracy of the recognition system

is.

The summary of the experimental results for all available sets are shown in Table 2.

Table 2

Minimum value of WER on the train, validation and test sets

Experiment \ Set

kazbnt.

train

kazbnt.

dev

kazbnt.

test_khabar

kazbnt.

test_astanatv

kazbnt.

test_channel31

Monophones

9.50 %

9.84 %

14.56 %

18.75 %

29.71 %

Triphones

5.70 %

6.32 %

6.36 %

9.88 %

17.13 %

DNN1

5.15 %

5.38 %

5.44 %

8.68 %

17.25 %

DNN2

3.86 %

4.54 %

4.06 %

7.52 %

14.54 %

Bold font indicates the best WER results. In all cases the best WER was achieved by using

the DNN2 acoustic model. There is a marked difference between the results for the “Khabar”

channel (WER 4.06%), “Astana TV” (WER 7.52%) and “Channel 31” (WER 14.54%). It can be

seemingly explained by different quality of the audio data, in terms of background noise and

interfering sounds.

The obtained results are commensurable with similar results for other languages. For example, for

Arabic the WER is 8.61% (KACST v1.10 [1]), for English the WER is 11.6% (CU-HTK 2006 [2]), for

Mandarin Chinese the CER is 15.9% (based on LIMSI [3]).

It should be mentioned that in spite of the fact that the DNN2 model shows the best results,

it is still very slow in action. This is due to its large size and high resource intensity, which makes

the model require a great deal of time to load into the RAM and

initialize

itself. To solve this

problem we shall need to take certain actions on the

optimization

of the model loading at the system

level.

Conclusion and Future Work

In this work we presented a baseline Kazakh broadcast news transcription system built on

Kaldi platform which demonstrates quite tolerable recognition accuracy when using deep neural

networks. Also it is worth mentioning that we have collected and prepared speech data containing

real TV news which we used for acoustic modelling.

Although the results are promising, there are several directions to improve the system

performance in terms of recognition accuracy. These are segmentation and clustering of speech data

into homogeneous intervals. Another important issue to address is the speed of speech recognition.

References:

1. Mansour Alghamdi, Moustafa Elshafei, Husni Al-Muhtaseb. “Arabic broadcast news transcription

system”. International Journal of Speech Technology, Volume 10, Issue 4, pp. 183–195.

2. M. J. F. Gales, Do Yeong Kim; P. C. Woodland; Ho Yin Chan; D. Mrva; R. Sinha; S. E. Tranter.

"Progress in the CU-HTK broadcast news transcription system". In the IEEE Transactions on Audio, Speech,

and Language Processing, vol. 14, no. 5, September 2006, pp. 1513–1525.

3. R. Sinha, M.J.F. Gales, D.Y. Kim, X.A. Liu, K.C. Sim, P.C. Woodland. “The CU-HTK Mandarin

broadcast news transcription system”. In the Proc. ICASSP, 2006. IV 1280.

4. Jean-luc Gauvain , Lori Lamel , Gilles Adda. “The LIMSI Broadcast News Transcription System”.

Speech Communication, vol. 37, iss. 1–2, pp. 89–108.

5. O. Makhambetov, A. Makazhanov, Zh. Yessenbayev, B. Matkarimov, I. Sabyrgaliyev, and A.

Sharafudinov. 2013. “Assembling the Kazakh Language Corpus”. In Proceedings of the 2013 Conference on

«ҚОҒАМДЫ АҚПАРАТТАНДЫРУ» V ХАЛЫҚАРАЛЫҚ ҒЫЛЫМИ-ПРАКТИКАЛЫҚ КОНФЕРЕНЦИЯ

Empirical Methods in Natural Language Processing, pp. 1022–1031. Association for Computational

Linguistics.

6. “Khabar” TV channel, official site. URL: khabar.kz [Access date: 18.04.2016].

7. “Astana TV” channel, official site. URL: astanatv.kz [Access date: 18.04.2016].

8. “Channel 31” TV channel, official site. URL: 31.kz [Access date: 18.04.2016].

9. IRSTLM Toolkit version 5.80.08. URL: https://sourceforge.net/projects/irstlm/ [Access date:

18.04.2016]

10. Povey, Daniel and Ghoshal, Arnab and Boulianne, Gilles and Burget, Lukas and Glembek,

Ondrej and Goel, Nagendra and Hannemann, Mirko and Motlicek, Petr and Qian, Yanmin and Schwarz, Petr

and Silovsky, Jan and Stemmer, Georg and Vesely, Karel. “The Kaldi Speech Recognition Toolkit”. IEEE

2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, 2011.

UDC 681.5

YUNICHEVA N.R., BEREKE M.B.

BUILDING SOLUTIONS SET OF THE SOLUTIONS OF THE INTERVAL ALGEBRAIC

EQUATIONS SYSTEM IN THE PROBLEM OF OBJECT CONTROL

SYSTEMS SYNTHESIS WITH INACCURATE DATA

(Institute of Information and Computational Technologies,

Al-Farabi Kazakh National University, Almaty, Kazakhstan)

The article presents the procedure for solving the task of parametric control synthesis, which is

brought to the resolvability of interval algebraic equations. Solution for the obtained system has been found

in the class of "controlled" solutions.

It was stated several times that the real technical objects function under conditions of parametric

uncertainty. Such uncertainty is resulted from the presence of uncontrolled disturbances which affect the

control objects, because of not lack of knowledge of true parameter values of control objects due to the

complexity of the process, and sometimes their unpredictable variation in time. In almost all cases, the

above-mentioned parametric uncertainty is characterized by belonging real parameter values of the technical

object to some intervals, the limits of which are known on a priori basis. Their mathematical models can be

represented by systems of integral differential and difference equations with the use of rules and designations

of interval analysis [1], and the class of such control objects is commonly known as interval-based.

Thus, we face the problem of control of not only the subject, but a family or set of objects.

It has been noticed that the formulated problem brought to resolvability of the system of such linear

interval algebraic inclusions [2]:

H

K

P





, (1)

Meaning of the term "solutions" of interval system of inclusions of type (1) requires a special

clarification, as interval uncertainty of the system data can be interpreted in two ways, in accordance with the

dual understanding of intervals themselves. In the first case, interval

 

x x

is a set of all real numbers from

, and in the second case it holds even a single meaning between

and

. In mathematical terms,

this difference is expressed by use of universal quantifiers



and existential quantifier



: in the fist case it

is recorded

 

 

x x

, and in the second case

 

 

x

x x

. As for the parameters of the system of linear

interval equations

p

ij

, known only with their belonging to some intervals, the vital difference between two

types of interval uncertainty manifests as the difference between the parameters that can be changed within

the indicated intervals as a result of external unpredictable disturbances and parameters which we can

willfully vary within the set intervals, i.e., control them.

«ҚОҒАМДЫ АҚПАРАТТАНДЫРУ» V ХАЛЫҚАРАЛЫҚ ҒЫЛЫМИ-ПРАКТИКАЛЫҚ КОНФЕРЕНЦИЯ

There are the following

various definitions of solutions of the interval system of algebraic

interval equations [1] in the interval analysis:

Joint set of solutions,

   





 





 











}

H

PK

H

H

P

P

|

R

K

H

,

P

n



















, (2)

which is formed by solutions of all systems

PK

H



with

 

P

P



and

 

H

H



The problem of building the set of type (2) is commonly known as the identification

problem.

Allowable set of solutions

   





 





 





































H

PK

H

H

P

P

|

n

R

K

H

,

P

, (3)

which is formed by all such vectors

K

R

n



, so that the product

falls into

 

H for any

 

P

P



The problem of building the set of type (3) is commonly knows as the linear tolerance

problem.

Controlled set of solutions

   





 





 











H

PK

P

P

H

H

|

R

K

H

,

P

n



















, (4)

which is formed by such vectors

K

R

n



, so that the relevant satisfying

 

H

H



can be

selected

 

P

P



for any desired

PK

H



The problem of building the set of type (4) is the control problem.

In the work presented here, the problem of parametric synthesis of control, by analogy with

[2,3] is brought to resolvability of the system of interval algebraic inclusions.

The problem of finding a solution of the obtained system is challenging NP. To simplify the

problem and computational complexity, we can select the point vector (or mean vector) from the

interval vector of adjustable parameters and use it as an initial approximation. The solution can be

sought in the class of "controlled" solutions.

In 1992, Shary S.P. introduced the concept of "controlled solutions". This name is explained

by the fact that each vector

H



H

can be reached by the product

PK

as a result of appropriate

control or adjustment of matrix coefficients

P

within

Vector is

K

R

n



called as the controller of the system solution

PK

H



, provided that

each

H



H

has a matrix

P



P

, so that

PK

H





P





P

|

PK

We will use the evidence presented in [4]. The following mathematical formulation is true

for the controlled solutions.

Let us assume that

K

R

n



the controlled solution of the system

PK

H



, then

K

R

n



satisfies inequality.













K

H

K

P

c

c

where

)

(

h

h 





– is nonnegative vector of radiuses.

«ҚОҒАМДЫ АҚПАРАТТАНДЫРУ» V ХАЛЫҚАРАЛЫҚ ҒЫЛЫМИ-ПРАКТИКАЛЫҚ КОНФЕРЕНЦИЯ

Evidence: if the vector

K

R

n



is a controlled solution, it satisfies the inclusion which

results in

K

P

δ

H

δ

H

K

K

P

c

c

c

c

























and































K

H

K

P

K

c

c

It follows that,













K

H

K

P

c

c

Thus, the obtained controlled solutions can be used as an initial approximation when

building the interval vector of adjustable parameters.

References:

1. Sharyi S.P. Linear static systems with interval uncertainty: Effective algorithms for solving the

problems of control and stabilization// Computational technologies, 1995. V 4. P. 331-356

2. Yunicheva N.R. Questions of the analysis and synthesis of control systems by objects in

uncertainty conditions. Almaty, Printing house «Сlassics». 2011. – 95p.

3. Khlebalin N.A. Modal Control of Plants with Uncertain Interval Parameters, in: Proc. Intern.

Workshop «Control System Syntesis: Theory and Application», Novosibirsk, 1991. - P. 168-173.

4. Jolene L., Kiefer M., Deidre O., Walter E. Application interval analysis. М.: Institute for Computing

Research. 2007. - 467p.

УДК 004.822:514

АЛХАНОВ А.А., ОМАРБЕКОВА А.С.

ГЕНЕРАЦИЯ ТЕСТОВОГО ЗАДАНИЯ ИЗ RDF ФАЙЛА С ПОМОЩЬЮ PYTHON И

SPARQL

(Евразийский национальный университет им. Л.Н. Гумилева, г. Астана, Казахстан)

В данной статье рассматривается разработка системы для автоматической генерации

вопросов на основе базы знании. Эта работа имеет прикладной характер, предоставляет

примеры разработки онтологии и выполнение запросов в RDF документы. Также

отображается работа на языке программирования Python вместе с подключением

необходимых библиотек для работы с RDF файлами и с онтологиями сохраненными в

других форматах. В статье широко раскрывается понятие RDF документ и его назначение.

Немаловажное внимание уделено таким понятиям как триплы, утверждения, субъекты,

предикаты и объекты.

В качестве базы знании выступает онтология терминов по информатике сохраненная в

формате RDF. Resource Description Framework, т.е. RDF, это основа для описания ресурсов в

сети. RDF не единственный формат для сохранения онтологии, есть и другие форматы к

примеру Turtle, который дает более понятный для чтения человеком документ, еще один

пример Ontology Web Language(OWL) формат, который дает онтологии более расширенные

свойства делать логические заключения.

жүктеу/скачать 14,62 Mb.

Достарыңызбен бөлісу:

1 ... 4 5 6 7 8 9 10 11 ... 57