Cargando…

Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing

PREMISE OF THE STUDY: Phenotypic data sets are necessary to elucidate the genealogy of life, but assembling phenotypic data for taxa across the tree of life can be technically challenging and prohibitively time consuming. We describe a semi‐automated protocol to facilitate and expedite the assembly...

Descripción completa

Detalles Bibliográficos
Autores principales: Endara, Lorena, Cui, Hong, Burleigh, J. Gordon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5895189/
https://www.ncbi.nlm.nih.gov/pubmed/29732265
http://dx.doi.org/10.1002/aps3.1035
_version_ 1783313609921658880
author Endara, Lorena
Cui, Hong
Burleigh, J. Gordon
author_facet Endara, Lorena
Cui, Hong
Burleigh, J. Gordon
author_sort Endara, Lorena
collection PubMed
description PREMISE OF THE STUDY: Phenotypic data sets are necessary to elucidate the genealogy of life, but assembling phenotypic data for taxa across the tree of life can be technically challenging and prohibitively time consuming. We describe a semi‐automated protocol to facilitate and expedite the assembly of phenotypic character matrices of plants from formal taxonomic descriptions. This pipeline uses new natural language processing (NLP) techniques and a glossary of over 9000 botanical terms. METHODS AND RESULTS: Our protocol includes the Explorer of Taxon Concepts (ETC), an online application that assembles taxon‐by‐character matrices from taxonomic descriptions, and MatrixConverter, a Java application that enables users to evaluate and discretize the characters extracted by ETC. We demonstrate this protocol using descriptions from Araucariaceae. CONCLUSIONS: The NLP pipeline unlocks the phenotypic data found in taxonomic descriptions and makes them usable for evolutionary analyses.
format Online
Article
Text
id pubmed-5895189
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-58951892018-05-04 Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing Endara, Lorena Cui, Hong Burleigh, J. Gordon Appl Plant Sci Protocol Notes PREMISE OF THE STUDY: Phenotypic data sets are necessary to elucidate the genealogy of life, but assembling phenotypic data for taxa across the tree of life can be technically challenging and prohibitively time consuming. We describe a semi‐automated protocol to facilitate and expedite the assembly of phenotypic character matrices of plants from formal taxonomic descriptions. This pipeline uses new natural language processing (NLP) techniques and a glossary of over 9000 botanical terms. METHODS AND RESULTS: Our protocol includes the Explorer of Taxon Concepts (ETC), an online application that assembles taxon‐by‐character matrices from taxonomic descriptions, and MatrixConverter, a Java application that enables users to evaluate and discretize the characters extracted by ETC. We demonstrate this protocol using descriptions from Araucariaceae. CONCLUSIONS: The NLP pipeline unlocks the phenotypic data found in taxonomic descriptions and makes them usable for evolutionary analyses. John Wiley and Sons Inc. 2018-03-31 /pmc/articles/PMC5895189/ /pubmed/29732265 http://dx.doi.org/10.1002/aps3.1035 Text en © 2018 Endara et al. Applications in Plant Sciences is published by Wiley Periodicals, Inc. on behalf of the Botanical Society of America. This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Protocol Notes
Endara, Lorena
Cui, Hong
Burleigh, J. Gordon
Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing
title Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing
title_full Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing
title_fullStr Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing
title_full_unstemmed Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing
title_short Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing
title_sort extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing
topic Protocol Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5895189/
https://www.ncbi.nlm.nih.gov/pubmed/29732265
http://dx.doi.org/10.1002/aps3.1035
work_keys_str_mv AT endaralorena extractionofphenotypictraitsfromtaxonomicdescriptionsforthetreeoflifeusingnaturallanguageprocessing
AT cuihong extractionofphenotypictraitsfromtaxonomicdescriptionsforthetreeoflifeusingnaturallanguageprocessing
AT burleighjgordon extractionofphenotypictraitsfromtaxonomicdescriptionsforthetreeoflifeusingnaturallanguageprocessing