Cargando…

The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation

People spend a substantial portion of their lives engaged in conversation, and yet, our scientific understanding of conversation is still in its infancy. Here, we introduce a large, novel, and multimodal corpus of 1656 conversations recorded in spoken English. This 7+ million word, 850-hour corpus t...

Descripción completa

Detalles Bibliográficos
Autores principales: Reece, Andrew, Cooney, Gus, Bull, Peter, Chung, Christine, Dawson, Bryn, Fitzpatrick, Casey, Glazer, Tamara, Knox, Dean, Liebscher, Alex, Marin, Sebastian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Association for the Advancement of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10065445/
https://www.ncbi.nlm.nih.gov/pubmed/37000886
http://dx.doi.org/10.1126/sciadv.adf3197
_version_ 1785018113722417152
author Reece, Andrew
Cooney, Gus
Bull, Peter
Chung, Christine
Dawson, Bryn
Fitzpatrick, Casey
Glazer, Tamara
Knox, Dean
Liebscher, Alex
Marin, Sebastian
author_facet Reece, Andrew
Cooney, Gus
Bull, Peter
Chung, Christine
Dawson, Bryn
Fitzpatrick, Casey
Glazer, Tamara
Knox, Dean
Liebscher, Alex
Marin, Sebastian
author_sort Reece, Andrew
collection PubMed
description People spend a substantial portion of their lives engaged in conversation, and yet, our scientific understanding of conversation is still in its infancy. Here, we introduce a large, novel, and multimodal corpus of 1656 conversations recorded in spoken English. This 7+ million word, 850-hour corpus totals more than 1 terabyte of audio, video, and transcripts, with moment-to-moment measures of vocal, facial, and semantic expression, together with an extensive survey of speakers’ postconversation reflections. By taking advantage of the considerable scope of the corpus, we explore many examples of how this large-scale public dataset may catalyze future research, particularly across disciplinary boundaries, as scholars from a variety of fields appear increasingly interested in the study of conversation.
format Online
Article
Text
id pubmed-10065445
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Association for the Advancement of Science
record_format MEDLINE/PubMed
spelling pubmed-100654452023-04-01 The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation Reece, Andrew Cooney, Gus Bull, Peter Chung, Christine Dawson, Bryn Fitzpatrick, Casey Glazer, Tamara Knox, Dean Liebscher, Alex Marin, Sebastian Sci Adv Social and Interdisciplinary Sciences People spend a substantial portion of their lives engaged in conversation, and yet, our scientific understanding of conversation is still in its infancy. Here, we introduce a large, novel, and multimodal corpus of 1656 conversations recorded in spoken English. This 7+ million word, 850-hour corpus totals more than 1 terabyte of audio, video, and transcripts, with moment-to-moment measures of vocal, facial, and semantic expression, together with an extensive survey of speakers’ postconversation reflections. By taking advantage of the considerable scope of the corpus, we explore many examples of how this large-scale public dataset may catalyze future research, particularly across disciplinary boundaries, as scholars from a variety of fields appear increasingly interested in the study of conversation. American Association for the Advancement of Science 2023-03-31 /pmc/articles/PMC10065445/ /pubmed/37000886 http://dx.doi.org/10.1126/sciadv.adf3197 Text en Copyright © 2023 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC). https://creativecommons.org/licenses/by-nc/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license (https://creativecommons.org/licenses/by-nc/4.0/) , which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.
spellingShingle Social and Interdisciplinary Sciences
Reece, Andrew
Cooney, Gus
Bull, Peter
Chung, Christine
Dawson, Bryn
Fitzpatrick, Casey
Glazer, Tamara
Knox, Dean
Liebscher, Alex
Marin, Sebastian
The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation
title The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation
title_full The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation
title_fullStr The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation
title_full_unstemmed The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation
title_short The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation
title_sort candor corpus: insights from a large multimodal dataset of naturalistic conversation
topic Social and Interdisciplinary Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10065445/
https://www.ncbi.nlm.nih.gov/pubmed/37000886
http://dx.doi.org/10.1126/sciadv.adf3197
work_keys_str_mv AT reeceandrew thecandorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation
AT cooneygus thecandorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation
AT bullpeter thecandorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation
AT chungchristine thecandorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation
AT dawsonbryn thecandorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation
AT fitzpatrickcasey thecandorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation
AT glazertamara thecandorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation
AT knoxdean thecandorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation
AT liebscheralex thecandorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation
AT marinsebastian thecandorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation
AT reeceandrew candorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation
AT cooneygus candorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation
AT bullpeter candorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation
AT chungchristine candorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation
AT dawsonbryn candorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation
AT fitzpatrickcasey candorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation
AT glazertamara candorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation
AT knoxdean candorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation
AT liebscheralex candorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation
AT marinsebastian candorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation