Cargando…

Extracting and modeling geographic information from scientific articles

Scientific articles often contain relevant geographic information such as where field work was performed or where patients were treated. Most often, this information appears in the full-text article contents as a description in natural language including place names, with no accompanying machine-rea...

Descripción completa

Detalles Bibliográficos
Autores principales: Acheson, Elise, Purves, Ross S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7787447/
https://www.ncbi.nlm.nih.gov/pubmed/33406109
http://dx.doi.org/10.1371/journal.pone.0244918
_version_ 1783632826037436416
author Acheson, Elise
Purves, Ross S.
author_facet Acheson, Elise
Purves, Ross S.
author_sort Acheson, Elise
collection PubMed
description Scientific articles often contain relevant geographic information such as where field work was performed or where patients were treated. Most often, this information appears in the full-text article contents as a description in natural language including place names, with no accompanying machine-readable geographic metadata. Automatically extracting this geographic information could help conduct meta-analyses, find geographical research gaps, and retrieve articles using spatial search criteria. Research on this problem is still in its infancy, with many works manually processing corpora for locations and few cross-domain studies. In this paper, we develop a fully automatic pipeline to extract and represent relevant locations from scientific articles, applying it to two varied corpora. We obtain good performance, with full pipeline precision of 0.84 for an environmental corpus, and 0.78 for a biomedical corpus. Our results can be visualized as simple global maps, allowing human annotators to both explore corpus patterns in space and triage results for downstream analysis. Future work should not only focus on improving individual pipeline components, but also be informed by user needs derived from the potential spatial analysis and exploration of such corpora.
format Online
Article
Text
id pubmed-7787447
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-77874472021-01-14 Extracting and modeling geographic information from scientific articles Acheson, Elise Purves, Ross S. PLoS One Research Article Scientific articles often contain relevant geographic information such as where field work was performed or where patients were treated. Most often, this information appears in the full-text article contents as a description in natural language including place names, with no accompanying machine-readable geographic metadata. Automatically extracting this geographic information could help conduct meta-analyses, find geographical research gaps, and retrieve articles using spatial search criteria. Research on this problem is still in its infancy, with many works manually processing corpora for locations and few cross-domain studies. In this paper, we develop a fully automatic pipeline to extract and represent relevant locations from scientific articles, applying it to two varied corpora. We obtain good performance, with full pipeline precision of 0.84 for an environmental corpus, and 0.78 for a biomedical corpus. Our results can be visualized as simple global maps, allowing human annotators to both explore corpus patterns in space and triage results for downstream analysis. Future work should not only focus on improving individual pipeline components, but also be informed by user needs derived from the potential spatial analysis and exploration of such corpora. Public Library of Science 2021-01-06 /pmc/articles/PMC7787447/ /pubmed/33406109 http://dx.doi.org/10.1371/journal.pone.0244918 Text en © 2021 Acheson, Purves http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Acheson, Elise
Purves, Ross S.
Extracting and modeling geographic information from scientific articles
title Extracting and modeling geographic information from scientific articles
title_full Extracting and modeling geographic information from scientific articles
title_fullStr Extracting and modeling geographic information from scientific articles
title_full_unstemmed Extracting and modeling geographic information from scientific articles
title_short Extracting and modeling geographic information from scientific articles
title_sort extracting and modeling geographic information from scientific articles
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7787447/
https://www.ncbi.nlm.nih.gov/pubmed/33406109
http://dx.doi.org/10.1371/journal.pone.0244918
work_keys_str_mv AT achesonelise extractingandmodelinggeographicinformationfromscientificarticles
AT purvesrosss extractingandmodelinggeographicinformationfromscientificarticles