Cargando…

A Machine Reading System for Assembling Synthetic Paleontological Databases

Many aspects of macroevolutionary theory and our understanding of biotic responses to global environmental change derive from literature-based compilations of paleontological data. Existing manually assembled databases are, however, incomplete and difficult to assess and enhance with new data types....

Descripción completa

Detalles Bibliográficos
Autores principales: Peters, Shanan E., Zhang, Ce, Livny, Miron, Ré, Christopher
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4250071/
https://www.ncbi.nlm.nih.gov/pubmed/25436610
http://dx.doi.org/10.1371/journal.pone.0113523
_version_ 1782346945430290432
author Peters, Shanan E.
Zhang, Ce
Livny, Miron
Ré, Christopher
author_facet Peters, Shanan E.
Zhang, Ce
Livny, Miron
Ré, Christopher
author_sort Peters, Shanan E.
collection PubMed
description Many aspects of macroevolutionary theory and our understanding of biotic responses to global environmental change derive from literature-based compilations of paleontological data. Existing manually assembled databases are, however, incomplete and difficult to assess and enhance with new data types. Here, we develop and validate the quality of a machine reading system, PaleoDeepDive, that automatically locates and extracts data from heterogeneous text, tables, and figures in publications. PaleoDeepDive performs comparably to humans in several complex data extraction and inference tasks and generates congruent synthetic results that describe the geological history of taxonomic diversity and genus-level rates of origination and extinction. Unlike traditional databases, PaleoDeepDive produces a probabilistic database that systematically improves as information is added. We show that the system can readily accommodate sophisticated data types, such as morphological data in biological illustrations and associated textual descriptions. Our machine reading approach to scientific data integration and synthesis brings within reach many questions that are currently underdetermined and does so in ways that may stimulate entirely new modes of inquiry.
format Online
Article
Text
id pubmed-4250071
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-42500712014-12-05 A Machine Reading System for Assembling Synthetic Paleontological Databases Peters, Shanan E. Zhang, Ce Livny, Miron Ré, Christopher PLoS One Research Article Many aspects of macroevolutionary theory and our understanding of biotic responses to global environmental change derive from literature-based compilations of paleontological data. Existing manually assembled databases are, however, incomplete and difficult to assess and enhance with new data types. Here, we develop and validate the quality of a machine reading system, PaleoDeepDive, that automatically locates and extracts data from heterogeneous text, tables, and figures in publications. PaleoDeepDive performs comparably to humans in several complex data extraction and inference tasks and generates congruent synthetic results that describe the geological history of taxonomic diversity and genus-level rates of origination and extinction. Unlike traditional databases, PaleoDeepDive produces a probabilistic database that systematically improves as information is added. We show that the system can readily accommodate sophisticated data types, such as morphological data in biological illustrations and associated textual descriptions. Our machine reading approach to scientific data integration and synthesis brings within reach many questions that are currently underdetermined and does so in ways that may stimulate entirely new modes of inquiry. Public Library of Science 2014-12-01 /pmc/articles/PMC4250071/ /pubmed/25436610 http://dx.doi.org/10.1371/journal.pone.0113523 Text en © 2014 Peters et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Peters, Shanan E.
Zhang, Ce
Livny, Miron
Ré, Christopher
A Machine Reading System for Assembling Synthetic Paleontological Databases
title A Machine Reading System for Assembling Synthetic Paleontological Databases
title_full A Machine Reading System for Assembling Synthetic Paleontological Databases
title_fullStr A Machine Reading System for Assembling Synthetic Paleontological Databases
title_full_unstemmed A Machine Reading System for Assembling Synthetic Paleontological Databases
title_short A Machine Reading System for Assembling Synthetic Paleontological Databases
title_sort machine reading system for assembling synthetic paleontological databases
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4250071/
https://www.ncbi.nlm.nih.gov/pubmed/25436610
http://dx.doi.org/10.1371/journal.pone.0113523
work_keys_str_mv AT petersshanane amachinereadingsystemforassemblingsyntheticpaleontologicaldatabases
AT zhangce amachinereadingsystemforassemblingsyntheticpaleontologicaldatabases
AT livnymiron amachinereadingsystemforassemblingsyntheticpaleontologicaldatabases
AT rechristopher amachinereadingsystemforassemblingsyntheticpaleontologicaldatabases
AT petersshanane machinereadingsystemforassemblingsyntheticpaleontologicaldatabases
AT zhangce machinereadingsystemforassemblingsyntheticpaleontologicaldatabases
AT livnymiron machinereadingsystemforassemblingsyntheticpaleontologicaldatabases
AT rechristopher machinereadingsystemforassemblingsyntheticpaleontologicaldatabases