Cargando…

A studyforrest extension, an annotation of spoken language in the German dubbed movie “Forrest Gump” and its audio-description

Here we present an annotation of speech in the audio-visual movie “Forrest Gump” and its audio-description for a visually impaired audience, as an addition to a large public functional brain imaging dataset ( studyforrest.org). The annotation provides information about the exact timing of each of th...

Descripción completa

Detalles Bibliográficos
Autores principales: Häusler, Christian Olaf, Hanke, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7921887/
https://www.ncbi.nlm.nih.gov/pubmed/33732435
http://dx.doi.org/10.12688/f1000research.27621.1
_version_ 1783658563486351360
author Häusler, Christian Olaf
Hanke, Michael
author_facet Häusler, Christian Olaf
Hanke, Michael
author_sort Häusler, Christian Olaf
collection PubMed
description Here we present an annotation of speech in the audio-visual movie “Forrest Gump” and its audio-description for a visually impaired audience, as an addition to a large public functional brain imaging dataset ( studyforrest.org). The annotation provides information about the exact timing of each of the more than 2500 spoken sentences, 16,000 words (including 202 non-speech vocalizations), 66,000 phonemes, and their corresponding speaker. Additionally, for every word, we provide lemmatization, a simple part-of-speech-tagging (15 grammatical categories), a detailed part-of-speech tagging (43 grammatical categories), syntactic dependencies, and a semantic analysis based on word embedding which represents each word in a 300-dimensional semantic space. To validate the dataset’s quality, we build a model of hemodynamic brain activity based on information drawn from the annotation. Results suggest that the annotation’s content and quality enable independent researchers to create models of brain activity correlating with a variety of linguistic aspects under conditions of near-real-life complexity.
format Online
Article
Text
id pubmed-7921887
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-79218872021-03-16 A studyforrest extension, an annotation of spoken language in the German dubbed movie “Forrest Gump” and its audio-description Häusler, Christian Olaf Hanke, Michael F1000Res Data Note Here we present an annotation of speech in the audio-visual movie “Forrest Gump” and its audio-description for a visually impaired audience, as an addition to a large public functional brain imaging dataset ( studyforrest.org). The annotation provides information about the exact timing of each of the more than 2500 spoken sentences, 16,000 words (including 202 non-speech vocalizations), 66,000 phonemes, and their corresponding speaker. Additionally, for every word, we provide lemmatization, a simple part-of-speech-tagging (15 grammatical categories), a detailed part-of-speech tagging (43 grammatical categories), syntactic dependencies, and a semantic analysis based on word embedding which represents each word in a 300-dimensional semantic space. To validate the dataset’s quality, we build a model of hemodynamic brain activity based on information drawn from the annotation. Results suggest that the annotation’s content and quality enable independent researchers to create models of brain activity correlating with a variety of linguistic aspects under conditions of near-real-life complexity. F1000 Research Limited 2021-01-28 /pmc/articles/PMC7921887/ /pubmed/33732435 http://dx.doi.org/10.12688/f1000research.27621.1 Text en Copyright: © 2021 Häusler CO and Hanke M http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Data Note
Häusler, Christian Olaf
Hanke, Michael
A studyforrest extension, an annotation of spoken language in the German dubbed movie “Forrest Gump” and its audio-description
title A studyforrest extension, an annotation of spoken language in the German dubbed movie “Forrest Gump” and its audio-description
title_full A studyforrest extension, an annotation of spoken language in the German dubbed movie “Forrest Gump” and its audio-description
title_fullStr A studyforrest extension, an annotation of spoken language in the German dubbed movie “Forrest Gump” and its audio-description
title_full_unstemmed A studyforrest extension, an annotation of spoken language in the German dubbed movie “Forrest Gump” and its audio-description
title_short A studyforrest extension, an annotation of spoken language in the German dubbed movie “Forrest Gump” and its audio-description
title_sort studyforrest extension, an annotation of spoken language in the german dubbed movie “forrest gump” and its audio-description
topic Data Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7921887/
https://www.ncbi.nlm.nih.gov/pubmed/33732435
http://dx.doi.org/10.12688/f1000research.27621.1
work_keys_str_mv AT hauslerchristianolaf astudyforrestextensionanannotationofspokenlanguageinthegermandubbedmovieforrestgumpanditsaudiodescription
AT hankemichael astudyforrestextensionanannotationofspokenlanguageinthegermandubbedmovieforrestgumpanditsaudiodescription
AT hauslerchristianolaf studyforrestextensionanannotationofspokenlanguageinthegermandubbedmovieforrestgumpanditsaudiodescription
AT hankemichael studyforrestextensionanannotationofspokenlanguageinthegermandubbedmovieforrestgumpanditsaudiodescription