Cargando…
A studyforrest extension, an annotation of spoken language in the German dubbed movie “Forrest Gump” and its audio-description
Here we present an annotation of speech in the audio-visual movie “Forrest Gump” and its audio-description for a visually impaired audience, as an addition to a large public functional brain imaging dataset ( studyforrest.org). The annotation provides information about the exact timing of each of th...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
F1000 Research Limited
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7921887/ https://www.ncbi.nlm.nih.gov/pubmed/33732435 http://dx.doi.org/10.12688/f1000research.27621.1 |
_version_ | 1783658563486351360 |
---|---|
author | Häusler, Christian Olaf Hanke, Michael |
author_facet | Häusler, Christian Olaf Hanke, Michael |
author_sort | Häusler, Christian Olaf |
collection | PubMed |
description | Here we present an annotation of speech in the audio-visual movie “Forrest Gump” and its audio-description for a visually impaired audience, as an addition to a large public functional brain imaging dataset ( studyforrest.org). The annotation provides information about the exact timing of each of the more than 2500 spoken sentences, 16,000 words (including 202 non-speech vocalizations), 66,000 phonemes, and their corresponding speaker. Additionally, for every word, we provide lemmatization, a simple part-of-speech-tagging (15 grammatical categories), a detailed part-of-speech tagging (43 grammatical categories), syntactic dependencies, and a semantic analysis based on word embedding which represents each word in a 300-dimensional semantic space. To validate the dataset’s quality, we build a model of hemodynamic brain activity based on information drawn from the annotation. Results suggest that the annotation’s content and quality enable independent researchers to create models of brain activity correlating with a variety of linguistic aspects under conditions of near-real-life complexity. |
format | Online Article Text |
id | pubmed-7921887 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | F1000 Research Limited |
record_format | MEDLINE/PubMed |
spelling | pubmed-79218872021-03-16 A studyforrest extension, an annotation of spoken language in the German dubbed movie “Forrest Gump” and its audio-description Häusler, Christian Olaf Hanke, Michael F1000Res Data Note Here we present an annotation of speech in the audio-visual movie “Forrest Gump” and its audio-description for a visually impaired audience, as an addition to a large public functional brain imaging dataset ( studyforrest.org). The annotation provides information about the exact timing of each of the more than 2500 spoken sentences, 16,000 words (including 202 non-speech vocalizations), 66,000 phonemes, and their corresponding speaker. Additionally, for every word, we provide lemmatization, a simple part-of-speech-tagging (15 grammatical categories), a detailed part-of-speech tagging (43 grammatical categories), syntactic dependencies, and a semantic analysis based on word embedding which represents each word in a 300-dimensional semantic space. To validate the dataset’s quality, we build a model of hemodynamic brain activity based on information drawn from the annotation. Results suggest that the annotation’s content and quality enable independent researchers to create models of brain activity correlating with a variety of linguistic aspects under conditions of near-real-life complexity. F1000 Research Limited 2021-01-28 /pmc/articles/PMC7921887/ /pubmed/33732435 http://dx.doi.org/10.12688/f1000research.27621.1 Text en Copyright: © 2021 Häusler CO and Hanke M http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Data Note Häusler, Christian Olaf Hanke, Michael A studyforrest extension, an annotation of spoken language in the German dubbed movie “Forrest Gump” and its audio-description |
title | A studyforrest extension, an annotation of spoken language in the German dubbed movie “Forrest Gump” and its audio-description |
title_full | A studyforrest extension, an annotation of spoken language in the German dubbed movie “Forrest Gump” and its audio-description |
title_fullStr | A studyforrest extension, an annotation of spoken language in the German dubbed movie “Forrest Gump” and its audio-description |
title_full_unstemmed | A studyforrest extension, an annotation of spoken language in the German dubbed movie “Forrest Gump” and its audio-description |
title_short | A studyforrest extension, an annotation of spoken language in the German dubbed movie “Forrest Gump” and its audio-description |
title_sort | studyforrest extension, an annotation of spoken language in the german dubbed movie “forrest gump” and its audio-description |
topic | Data Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7921887/ https://www.ncbi.nlm.nih.gov/pubmed/33732435 http://dx.doi.org/10.12688/f1000research.27621.1 |
work_keys_str_mv | AT hauslerchristianolaf astudyforrestextensionanannotationofspokenlanguageinthegermandubbedmovieforrestgumpanditsaudiodescription AT hankemichael astudyforrestextensionanannotationofspokenlanguageinthegermandubbedmovieforrestgumpanditsaudiodescription AT hauslerchristianolaf studyforrestextensionanannotationofspokenlanguageinthegermandubbedmovieforrestgumpanditsaudiodescription AT hankemichael studyforrestextensionanannotationofspokenlanguageinthegermandubbedmovieforrestgumpanditsaudiodescription |