Recording setup The actual recording sessions took place in a sound-proof studio of the university with the
assistance of a sound operator. Before the recordings, the speakers were instructed, documented and
given some time to prepare as well as asked to fill in the copyright transfer form for the audio data
with their voice. They were not constrained on the manner, speed or time except for the correctness
of reading. The average time for a recording session per speaker was about 40-45 minutes, though
there were cases that last up to 2 hours.
Audio data were captured using the professional vocal microphone Neumann TLM 49 and
digitized by LEXICON I-ONIX U82S sound card. The format of the recorded audio files is 44.1
kHz 16-bit PCM-encoded mono WAVE file format. All the recorded audio files were manually
post-processed to have each utterance (sentences and stories) in a separate file and in the
corresponding directories. The size of the speech corpus is about 8.5 GB on disk. The total duration
of the audio files is about 28 hours with 23 hours of “sentences” and 5 hours of “stories” parts,
respectively.