Cargando…

Evaluating named entity recognition tools for extracting social networks from novels

The analysis of literary works has experienced a surge in computer-assisted processing. To obtain insights into the community structures and social interactions portrayed in novels, the creation of social networks from novels has gained popularity. Many methods rely on identifying named entities and...

Descripción completa

Detalles Bibliográficos
Autores principales: Dekker, Niels, Kuhn, Tobias, van Erp, Marieke
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924459/
https://www.ncbi.nlm.nih.gov/pubmed/33816842
http://dx.doi.org/10.7717/peerj-cs.189
_version_ 1783659094299639808
author Dekker, Niels
Kuhn, Tobias
van Erp, Marieke
author_facet Dekker, Niels
Kuhn, Tobias
van Erp, Marieke
author_sort Dekker, Niels
collection PubMed
description The analysis of literary works has experienced a surge in computer-assisted processing. To obtain insights into the community structures and social interactions portrayed in novels, the creation of social networks from novels has gained popularity. Many methods rely on identifying named entities and relations for the construction of these networks, but many of these tools are not specifically created for the literary domain. Furthermore, many of the studies on information extraction from literature typically focus on 19th and early 20th century source material. Because of this, it is unclear if these techniques are as suitable to modern-day literature as they are to those older novels. We present a study in which we evaluate natural language processing tools for the automatic extraction of social networks from novels as well as their network structure. We find that there are no significant differences between old and modern novels but that both are subject to a large amount of variance. Furthermore, we identify several issues that complicate named entity recognition in our set of novels and we present methods to remedy these. We see this work as a step in creating more culturally-aware AI systems.
format Online
Article
Text
id pubmed-7924459
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-79244592021-04-02 Evaluating named entity recognition tools for extracting social networks from novels Dekker, Niels Kuhn, Tobias van Erp, Marieke PeerJ Comput Sci Computational Linguistics The analysis of literary works has experienced a surge in computer-assisted processing. To obtain insights into the community structures and social interactions portrayed in novels, the creation of social networks from novels has gained popularity. Many methods rely on identifying named entities and relations for the construction of these networks, but many of these tools are not specifically created for the literary domain. Furthermore, many of the studies on information extraction from literature typically focus on 19th and early 20th century source material. Because of this, it is unclear if these techniques are as suitable to modern-day literature as they are to those older novels. We present a study in which we evaluate natural language processing tools for the automatic extraction of social networks from novels as well as their network structure. We find that there are no significant differences between old and modern novels but that both are subject to a large amount of variance. Furthermore, we identify several issues that complicate named entity recognition in our set of novels and we present methods to remedy these. We see this work as a step in creating more culturally-aware AI systems. PeerJ Inc. 2019-04-18 /pmc/articles/PMC7924459/ /pubmed/33816842 http://dx.doi.org/10.7717/peerj-cs.189 Text en © 2019 Dekker et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Computational Linguistics
Dekker, Niels
Kuhn, Tobias
van Erp, Marieke
Evaluating named entity recognition tools for extracting social networks from novels
title Evaluating named entity recognition tools for extracting social networks from novels
title_full Evaluating named entity recognition tools for extracting social networks from novels
title_fullStr Evaluating named entity recognition tools for extracting social networks from novels
title_full_unstemmed Evaluating named entity recognition tools for extracting social networks from novels
title_short Evaluating named entity recognition tools for extracting social networks from novels
title_sort evaluating named entity recognition tools for extracting social networks from novels
topic Computational Linguistics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924459/
https://www.ncbi.nlm.nih.gov/pubmed/33816842
http://dx.doi.org/10.7717/peerj-cs.189
work_keys_str_mv AT dekkerniels evaluatingnamedentityrecognitiontoolsforextractingsocialnetworksfromnovels
AT kuhntobias evaluatingnamedentityrecognitiontoolsforextractingsocialnetworksfromnovels
AT vanerpmarieke evaluatingnamedentityrecognitiontoolsforextractingsocialnetworksfromnovels