Cargando…
Evaluating named entity recognition tools for extracting social networks from novels
The analysis of literary works has experienced a surge in computer-assisted processing. To obtain insights into the community structures and social interactions portrayed in novels, the creation of social networks from novels has gained popularity. Many methods rely on identifying named entities and...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924459/ https://www.ncbi.nlm.nih.gov/pubmed/33816842 http://dx.doi.org/10.7717/peerj-cs.189 |
_version_ | 1783659094299639808 |
---|---|
author | Dekker, Niels Kuhn, Tobias van Erp, Marieke |
author_facet | Dekker, Niels Kuhn, Tobias van Erp, Marieke |
author_sort | Dekker, Niels |
collection | PubMed |
description | The analysis of literary works has experienced a surge in computer-assisted processing. To obtain insights into the community structures and social interactions portrayed in novels, the creation of social networks from novels has gained popularity. Many methods rely on identifying named entities and relations for the construction of these networks, but many of these tools are not specifically created for the literary domain. Furthermore, many of the studies on information extraction from literature typically focus on 19th and early 20th century source material. Because of this, it is unclear if these techniques are as suitable to modern-day literature as they are to those older novels. We present a study in which we evaluate natural language processing tools for the automatic extraction of social networks from novels as well as their network structure. We find that there are no significant differences between old and modern novels but that both are subject to a large amount of variance. Furthermore, we identify several issues that complicate named entity recognition in our set of novels and we present methods to remedy these. We see this work as a step in creating more culturally-aware AI systems. |
format | Online Article Text |
id | pubmed-7924459 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-79244592021-04-02 Evaluating named entity recognition tools for extracting social networks from novels Dekker, Niels Kuhn, Tobias van Erp, Marieke PeerJ Comput Sci Computational Linguistics The analysis of literary works has experienced a surge in computer-assisted processing. To obtain insights into the community structures and social interactions portrayed in novels, the creation of social networks from novels has gained popularity. Many methods rely on identifying named entities and relations for the construction of these networks, but many of these tools are not specifically created for the literary domain. Furthermore, many of the studies on information extraction from literature typically focus on 19th and early 20th century source material. Because of this, it is unclear if these techniques are as suitable to modern-day literature as they are to those older novels. We present a study in which we evaluate natural language processing tools for the automatic extraction of social networks from novels as well as their network structure. We find that there are no significant differences between old and modern novels but that both are subject to a large amount of variance. Furthermore, we identify several issues that complicate named entity recognition in our set of novels and we present methods to remedy these. We see this work as a step in creating more culturally-aware AI systems. PeerJ Inc. 2019-04-18 /pmc/articles/PMC7924459/ /pubmed/33816842 http://dx.doi.org/10.7717/peerj-cs.189 Text en © 2019 Dekker et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Computational Linguistics Dekker, Niels Kuhn, Tobias van Erp, Marieke Evaluating named entity recognition tools for extracting social networks from novels |
title | Evaluating named entity recognition tools for extracting social networks from novels |
title_full | Evaluating named entity recognition tools for extracting social networks from novels |
title_fullStr | Evaluating named entity recognition tools for extracting social networks from novels |
title_full_unstemmed | Evaluating named entity recognition tools for extracting social networks from novels |
title_short | Evaluating named entity recognition tools for extracting social networks from novels |
title_sort | evaluating named entity recognition tools for extracting social networks from novels |
topic | Computational Linguistics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924459/ https://www.ncbi.nlm.nih.gov/pubmed/33816842 http://dx.doi.org/10.7717/peerj-cs.189 |
work_keys_str_mv | AT dekkerniels evaluatingnamedentityrecognitiontoolsforextractingsocialnetworksfromnovels AT kuhntobias evaluatingnamedentityrecognitiontoolsforextractingsocialnetworksfromnovels AT vanerpmarieke evaluatingnamedentityrecognitiontoolsforextractingsocialnetworksfromnovels |