Cargando…
Database description: Russian fricatives recorded in 198 real speech sentences from 59 speakers
This speech dataset is primarily designed to investigate linguistic and speaker information in fricative sounds in Russian. Acoustic recordings were obtained from 59 students (30 females and 29 males) between 18 and 30 years. Eighteen participants were recorded in a second session. The participants...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10293980/ https://www.ncbi.nlm.nih.gov/pubmed/37383770 http://dx.doi.org/10.1016/j.dib.2023.109205 |
_version_ | 1785063100854042624 |
---|---|
author | Ulrich, Natalja |
author_facet | Ulrich, Natalja |
author_sort | Ulrich, Natalja |
collection | PubMed |
description | This speech dataset is primarily designed to investigate linguistic and speaker information in fricative sounds in Russian. Acoustic recordings were obtained from 59 students (30 females and 29 males) between 18 and 30 years. Eighteen participants were recorded in a second session. The participants were born and lived since their early childhood in St. Petersburg. The participants did not report any speech or hearing impairment. The recording sessions were conducted at the phonetic laboratory of the Phonetic Institute in St. Petersburg, in an audiometric booth using the recording program Speech-Recorder version 3.28.0 at a sample rate of 44.1 kHz (16-bit encoding). During the recordings, a clip-on microphone (Sennheiser MKE 2-P) was placed at a distance of 15cm from the speakers’ mouth and connected through an audio interface (Zoom U-22) to a laptop computer. The participants were instructed to read 198 randomized sentences from a computer screen. The fricatives [f], [s], [ʃ], [x], [v], [z], [ʒ], [sʲ], [ɕ], [vʲ], [zʲ] were embedded into those sentences. Two sentence structures were designed to obtain each real-word lexemes produced in three different contexts. The first type of sentence is a so-called carrier sentence with the structure of “She said ”X” and not “Y” ”. Minimal pairs of real words, containing one of the 11 tested fricatives were placed in both “X” and “Y” positions. The second type of pre-designed sentence was a natural language sentence including each of the lexemes. All raw audio files were first automatically pre-processed by applying the online tool Munich Automatic Segmentation system. Then, the files of the first recording session were filtered below 80 and above 20050 Hz, and the boundaries were manually corrected using Praat. The dataset consists of 22,561 fricative tokens. The number of observations per sound differs across categories, because of their natural distribution. The dataset is made available as a collection of audio files in wav format along with companion Praat TextGrid files for each sentence. Target fricatives are furthermore available as individual wav files. The whole dataset can be accessed with the DOI https://doi.org/10.48656/4q9c-gz16. Additionally, the experimental design allows the investigation of other sound categories. The number of speakers recorded gives further possibilities for phonetic-oriented speaker identification studies. |
format | Online Article Text |
id | pubmed-10293980 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-102939802023-06-28 Database description: Russian fricatives recorded in 198 real speech sentences from 59 speakers Ulrich, Natalja Data Brief Data Article This speech dataset is primarily designed to investigate linguistic and speaker information in fricative sounds in Russian. Acoustic recordings were obtained from 59 students (30 females and 29 males) between 18 and 30 years. Eighteen participants were recorded in a second session. The participants were born and lived since their early childhood in St. Petersburg. The participants did not report any speech or hearing impairment. The recording sessions were conducted at the phonetic laboratory of the Phonetic Institute in St. Petersburg, in an audiometric booth using the recording program Speech-Recorder version 3.28.0 at a sample rate of 44.1 kHz (16-bit encoding). During the recordings, a clip-on microphone (Sennheiser MKE 2-P) was placed at a distance of 15cm from the speakers’ mouth and connected through an audio interface (Zoom U-22) to a laptop computer. The participants were instructed to read 198 randomized sentences from a computer screen. The fricatives [f], [s], [ʃ], [x], [v], [z], [ʒ], [sʲ], [ɕ], [vʲ], [zʲ] were embedded into those sentences. Two sentence structures were designed to obtain each real-word lexemes produced in three different contexts. The first type of sentence is a so-called carrier sentence with the structure of “She said ”X” and not “Y” ”. Minimal pairs of real words, containing one of the 11 tested fricatives were placed in both “X” and “Y” positions. The second type of pre-designed sentence was a natural language sentence including each of the lexemes. All raw audio files were first automatically pre-processed by applying the online tool Munich Automatic Segmentation system. Then, the files of the first recording session were filtered below 80 and above 20050 Hz, and the boundaries were manually corrected using Praat. The dataset consists of 22,561 fricative tokens. The number of observations per sound differs across categories, because of their natural distribution. The dataset is made available as a collection of audio files in wav format along with companion Praat TextGrid files for each sentence. Target fricatives are furthermore available as individual wav files. The whole dataset can be accessed with the DOI https://doi.org/10.48656/4q9c-gz16. Additionally, the experimental design allows the investigation of other sound categories. The number of speakers recorded gives further possibilities for phonetic-oriented speaker identification studies. Elsevier 2023-05-11 /pmc/articles/PMC10293980/ /pubmed/37383770 http://dx.doi.org/10.1016/j.dib.2023.109205 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Data Article Ulrich, Natalja Database description: Russian fricatives recorded in 198 real speech sentences from 59 speakers |
title | Database description: Russian fricatives recorded in 198 real speech sentences from 59 speakers |
title_full | Database description: Russian fricatives recorded in 198 real speech sentences from 59 speakers |
title_fullStr | Database description: Russian fricatives recorded in 198 real speech sentences from 59 speakers |
title_full_unstemmed | Database description: Russian fricatives recorded in 198 real speech sentences from 59 speakers |
title_short | Database description: Russian fricatives recorded in 198 real speech sentences from 59 speakers |
title_sort | database description: russian fricatives recorded in 198 real speech sentences from 59 speakers |
topic | Data Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10293980/ https://www.ncbi.nlm.nih.gov/pubmed/37383770 http://dx.doi.org/10.1016/j.dib.2023.109205 |
work_keys_str_mv | AT ulrichnatalja databasedescriptionrussianfricativesrecordedin198realspeechsentencesfrom59speakers |