Cargando…

The SFU Opinion and Comments Corpus: A Corpus for the Analysis of Online News Comments

We present the SFU Opinion and Comments Corpus (SOCC ), a collection of opinion articles and the comments posted in response to the articles. The articles include all the opinion pieces published in the Canadian newspaper The Globe and Mail in the 5-year period between 2012 and 2016, a total of 10,3...

Descripción completa

Detalles Bibliográficos
Autores principales: Kolhatkar, Varada, Wu, Hanhan, Cavasso, Luca, Francis, Emilie, Shukla, Kavan, Taboada, Maite
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7357677/
https://www.ncbi.nlm.nih.gov/pubmed/32685909
http://dx.doi.org/10.1007/s41701-019-00065-w
_version_ 1783558714419052544
author Kolhatkar, Varada
Wu, Hanhan
Cavasso, Luca
Francis, Emilie
Shukla, Kavan
Taboada, Maite
author_facet Kolhatkar, Varada
Wu, Hanhan
Cavasso, Luca
Francis, Emilie
Shukla, Kavan
Taboada, Maite
author_sort Kolhatkar, Varada
collection PubMed
description We present the SFU Opinion and Comments Corpus (SOCC ), a collection of opinion articles and the comments posted in response to the articles. The articles include all the opinion pieces published in the Canadian newspaper The Globe and Mail in the 5-year period between 2012 and 2016, a total of 10,339 articles and 663,173 comments. SOCC is part of a project that investigates the linguistic characteristics of online comments. The corpus can be used to study a host of pragmatic phenomena. Among other aspects, researchers can explore: the connections between articles and comments; the connections of comments to each other; the types of topics discussed in comments; the nice (constructive) or mean (toxic) ways in which commenters respond to each other; how language is used to convey very specific types of evaluation; and how negation affects the interpretation of evaluative meaning in discourse. Our current focus is the study of constructiveness and evaluation in the comments. To that end, we have annotated a subset of the large corpus (1043 comments) with four layers of annotations: constructiveness, toxicity, negation and Appraisal (Martin and White, The language of evaluation, Palgrave, New York, 2005). This paper details our corpus, the data collection process, the characteristics of the corpus and describes the annotations. While our focus is comments posted in response to opinion news articles, the phenomena in this corpus are likely to be present in many commenting platforms: other news comments, comments and replies in fora such as Reddit, feedback on blogs, or YouTube comments.
format Online
Article
Text
id pubmed-7357677
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-73576772020-07-16 The SFU Opinion and Comments Corpus: A Corpus for the Analysis of Online News Comments Kolhatkar, Varada Wu, Hanhan Cavasso, Luca Francis, Emilie Shukla, Kavan Taboada, Maite Corpus Pragmat Original Paper We present the SFU Opinion and Comments Corpus (SOCC ), a collection of opinion articles and the comments posted in response to the articles. The articles include all the opinion pieces published in the Canadian newspaper The Globe and Mail in the 5-year period between 2012 and 2016, a total of 10,339 articles and 663,173 comments. SOCC is part of a project that investigates the linguistic characteristics of online comments. The corpus can be used to study a host of pragmatic phenomena. Among other aspects, researchers can explore: the connections between articles and comments; the connections of comments to each other; the types of topics discussed in comments; the nice (constructive) or mean (toxic) ways in which commenters respond to each other; how language is used to convey very specific types of evaluation; and how negation affects the interpretation of evaluative meaning in discourse. Our current focus is the study of constructiveness and evaluation in the comments. To that end, we have annotated a subset of the large corpus (1043 comments) with four layers of annotations: constructiveness, toxicity, negation and Appraisal (Martin and White, The language of evaluation, Palgrave, New York, 2005). This paper details our corpus, the data collection process, the characteristics of the corpus and describes the annotations. While our focus is comments posted in response to opinion news articles, the phenomena in this corpus are likely to be present in many commenting platforms: other news comments, comments and replies in fora such as Reddit, feedback on blogs, or YouTube comments. Springer International Publishing 2019-11-02 2020 /pmc/articles/PMC7357677/ /pubmed/32685909 http://dx.doi.org/10.1007/s41701-019-00065-w Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Original Paper
Kolhatkar, Varada
Wu, Hanhan
Cavasso, Luca
Francis, Emilie
Shukla, Kavan
Taboada, Maite
The SFU Opinion and Comments Corpus: A Corpus for the Analysis of Online News Comments
title The SFU Opinion and Comments Corpus: A Corpus for the Analysis of Online News Comments
title_full The SFU Opinion and Comments Corpus: A Corpus for the Analysis of Online News Comments
title_fullStr The SFU Opinion and Comments Corpus: A Corpus for the Analysis of Online News Comments
title_full_unstemmed The SFU Opinion and Comments Corpus: A Corpus for the Analysis of Online News Comments
title_short The SFU Opinion and Comments Corpus: A Corpus for the Analysis of Online News Comments
title_sort sfu opinion and comments corpus: a corpus for the analysis of online news comments
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7357677/
https://www.ncbi.nlm.nih.gov/pubmed/32685909
http://dx.doi.org/10.1007/s41701-019-00065-w
work_keys_str_mv AT kolhatkarvarada thesfuopinionandcommentscorpusacorpusfortheanalysisofonlinenewscomments
AT wuhanhan thesfuopinionandcommentscorpusacorpusfortheanalysisofonlinenewscomments
AT cavassoluca thesfuopinionandcommentscorpusacorpusfortheanalysisofonlinenewscomments
AT francisemilie thesfuopinionandcommentscorpusacorpusfortheanalysisofonlinenewscomments
AT shuklakavan thesfuopinionandcommentscorpusacorpusfortheanalysisofonlinenewscomments
AT taboadamaite thesfuopinionandcommentscorpusacorpusfortheanalysisofonlinenewscomments
AT kolhatkarvarada sfuopinionandcommentscorpusacorpusfortheanalysisofonlinenewscomments
AT wuhanhan sfuopinionandcommentscorpusacorpusfortheanalysisofonlinenewscomments
AT cavassoluca sfuopinionandcommentscorpusacorpusfortheanalysisofonlinenewscomments
AT francisemilie sfuopinionandcommentscorpusacorpusfortheanalysisofonlinenewscomments
AT shuklakavan sfuopinionandcommentscorpusacorpusfortheanalysisofonlinenewscomments
AT taboadamaite sfuopinionandcommentscorpusacorpusfortheanalysisofonlinenewscomments