Cargando…
The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation
People spend a substantial portion of their lives engaged in conversation, and yet, our scientific understanding of conversation is still in its infancy. Here, we introduce a large, novel, and multimodal corpus of 1656 conversations recorded in spoken English. This 7+ million word, 850-hour corpus t...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Association for the Advancement of Science
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10065445/ https://www.ncbi.nlm.nih.gov/pubmed/37000886 http://dx.doi.org/10.1126/sciadv.adf3197 |
_version_ | 1785018113722417152 |
---|---|
author | Reece, Andrew Cooney, Gus Bull, Peter Chung, Christine Dawson, Bryn Fitzpatrick, Casey Glazer, Tamara Knox, Dean Liebscher, Alex Marin, Sebastian |
author_facet | Reece, Andrew Cooney, Gus Bull, Peter Chung, Christine Dawson, Bryn Fitzpatrick, Casey Glazer, Tamara Knox, Dean Liebscher, Alex Marin, Sebastian |
author_sort | Reece, Andrew |
collection | PubMed |
description | People spend a substantial portion of their lives engaged in conversation, and yet, our scientific understanding of conversation is still in its infancy. Here, we introduce a large, novel, and multimodal corpus of 1656 conversations recorded in spoken English. This 7+ million word, 850-hour corpus totals more than 1 terabyte of audio, video, and transcripts, with moment-to-moment measures of vocal, facial, and semantic expression, together with an extensive survey of speakers’ postconversation reflections. By taking advantage of the considerable scope of the corpus, we explore many examples of how this large-scale public dataset may catalyze future research, particularly across disciplinary boundaries, as scholars from a variety of fields appear increasingly interested in the study of conversation. |
format | Online Article Text |
id | pubmed-10065445 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Association for the Advancement of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-100654452023-04-01 The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation Reece, Andrew Cooney, Gus Bull, Peter Chung, Christine Dawson, Bryn Fitzpatrick, Casey Glazer, Tamara Knox, Dean Liebscher, Alex Marin, Sebastian Sci Adv Social and Interdisciplinary Sciences People spend a substantial portion of their lives engaged in conversation, and yet, our scientific understanding of conversation is still in its infancy. Here, we introduce a large, novel, and multimodal corpus of 1656 conversations recorded in spoken English. This 7+ million word, 850-hour corpus totals more than 1 terabyte of audio, video, and transcripts, with moment-to-moment measures of vocal, facial, and semantic expression, together with an extensive survey of speakers’ postconversation reflections. By taking advantage of the considerable scope of the corpus, we explore many examples of how this large-scale public dataset may catalyze future research, particularly across disciplinary boundaries, as scholars from a variety of fields appear increasingly interested in the study of conversation. American Association for the Advancement of Science 2023-03-31 /pmc/articles/PMC10065445/ /pubmed/37000886 http://dx.doi.org/10.1126/sciadv.adf3197 Text en Copyright © 2023 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC). https://creativecommons.org/licenses/by-nc/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license (https://creativecommons.org/licenses/by-nc/4.0/) , which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited. |
spellingShingle | Social and Interdisciplinary Sciences Reece, Andrew Cooney, Gus Bull, Peter Chung, Christine Dawson, Bryn Fitzpatrick, Casey Glazer, Tamara Knox, Dean Liebscher, Alex Marin, Sebastian The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation |
title | The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation |
title_full | The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation |
title_fullStr | The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation |
title_full_unstemmed | The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation |
title_short | The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation |
title_sort | candor corpus: insights from a large multimodal dataset of naturalistic conversation |
topic | Social and Interdisciplinary Sciences |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10065445/ https://www.ncbi.nlm.nih.gov/pubmed/37000886 http://dx.doi.org/10.1126/sciadv.adf3197 |
work_keys_str_mv | AT reeceandrew thecandorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation AT cooneygus thecandorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation AT bullpeter thecandorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation AT chungchristine thecandorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation AT dawsonbryn thecandorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation AT fitzpatrickcasey thecandorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation AT glazertamara thecandorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation AT knoxdean thecandorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation AT liebscheralex thecandorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation AT marinsebastian thecandorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation AT reeceandrew candorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation AT cooneygus candorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation AT bullpeter candorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation AT chungchristine candorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation AT dawsonbryn candorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation AT fitzpatrickcasey candorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation AT glazertamara candorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation AT knoxdean candorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation AT liebscheralex candorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation AT marinsebastian candorcorpusinsightsfromalargemultimodaldatasetofnaturalisticconversation |