Cargando…
Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing
PREMISE OF THE STUDY: Phenotypic data sets are necessary to elucidate the genealogy of life, but assembling phenotypic data for taxa across the tree of life can be technically challenging and prohibitively time consuming. We describe a semi‐automated protocol to facilitate and expedite the assembly...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley and Sons Inc.
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5895189/ https://www.ncbi.nlm.nih.gov/pubmed/29732265 http://dx.doi.org/10.1002/aps3.1035 |
_version_ | 1783313609921658880 |
---|---|
author | Endara, Lorena Cui, Hong Burleigh, J. Gordon |
author_facet | Endara, Lorena Cui, Hong Burleigh, J. Gordon |
author_sort | Endara, Lorena |
collection | PubMed |
description | PREMISE OF THE STUDY: Phenotypic data sets are necessary to elucidate the genealogy of life, but assembling phenotypic data for taxa across the tree of life can be technically challenging and prohibitively time consuming. We describe a semi‐automated protocol to facilitate and expedite the assembly of phenotypic character matrices of plants from formal taxonomic descriptions. This pipeline uses new natural language processing (NLP) techniques and a glossary of over 9000 botanical terms. METHODS AND RESULTS: Our protocol includes the Explorer of Taxon Concepts (ETC), an online application that assembles taxon‐by‐character matrices from taxonomic descriptions, and MatrixConverter, a Java application that enables users to evaluate and discretize the characters extracted by ETC. We demonstrate this protocol using descriptions from Araucariaceae. CONCLUSIONS: The NLP pipeline unlocks the phenotypic data found in taxonomic descriptions and makes them usable for evolutionary analyses. |
format | Online Article Text |
id | pubmed-5895189 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | John Wiley and Sons Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-58951892018-05-04 Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing Endara, Lorena Cui, Hong Burleigh, J. Gordon Appl Plant Sci Protocol Notes PREMISE OF THE STUDY: Phenotypic data sets are necessary to elucidate the genealogy of life, but assembling phenotypic data for taxa across the tree of life can be technically challenging and prohibitively time consuming. We describe a semi‐automated protocol to facilitate and expedite the assembly of phenotypic character matrices of plants from formal taxonomic descriptions. This pipeline uses new natural language processing (NLP) techniques and a glossary of over 9000 botanical terms. METHODS AND RESULTS: Our protocol includes the Explorer of Taxon Concepts (ETC), an online application that assembles taxon‐by‐character matrices from taxonomic descriptions, and MatrixConverter, a Java application that enables users to evaluate and discretize the characters extracted by ETC. We demonstrate this protocol using descriptions from Araucariaceae. CONCLUSIONS: The NLP pipeline unlocks the phenotypic data found in taxonomic descriptions and makes them usable for evolutionary analyses. John Wiley and Sons Inc. 2018-03-31 /pmc/articles/PMC5895189/ /pubmed/29732265 http://dx.doi.org/10.1002/aps3.1035 Text en © 2018 Endara et al. Applications in Plant Sciences is published by Wiley Periodicals, Inc. on behalf of the Botanical Society of America. This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Protocol Notes Endara, Lorena Cui, Hong Burleigh, J. Gordon Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing |
title | Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing |
title_full | Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing |
title_fullStr | Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing |
title_full_unstemmed | Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing |
title_short | Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing |
title_sort | extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing |
topic | Protocol Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5895189/ https://www.ncbi.nlm.nih.gov/pubmed/29732265 http://dx.doi.org/10.1002/aps3.1035 |
work_keys_str_mv | AT endaralorena extractionofphenotypictraitsfromtaxonomicdescriptionsforthetreeoflifeusingnaturallanguageprocessing AT cuihong extractionofphenotypictraitsfromtaxonomicdescriptionsforthetreeoflifeusingnaturallanguageprocessing AT burleighjgordon extractionofphenotypictraitsfromtaxonomicdescriptionsforthetreeoflifeusingnaturallanguageprocessing |