Cargando…

Taxa: An R package implementing data standards and methods for taxonomic data

The taxa R package provides a set of tools for defining and manipulating taxonomic data. The recent and widespread application of DNA sequencing to community composition studies is making large data sets with taxonomic information commonplace. However, compared to typical tabular data, this informat...

Descripción completa

Detalles Bibliográficos
Autores principales: Foster, Zachary S.L., Chamberlain, Scott, Grünwald, Niklaus J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5887078/
https://www.ncbi.nlm.nih.gov/pubmed/29707201
http://dx.doi.org/10.12688/f1000research.14013.2
_version_ 1783312224193871872
author Foster, Zachary S.L.
Chamberlain, Scott
Grünwald, Niklaus J.
author_facet Foster, Zachary S.L.
Chamberlain, Scott
Grünwald, Niklaus J.
author_sort Foster, Zachary S.L.
collection PubMed
description The taxa R package provides a set of tools for defining and manipulating taxonomic data. The recent and widespread application of DNA sequencing to community composition studies is making large data sets with taxonomic information commonplace. However, compared to typical tabular data, this information is encoded in many different ways and the hierarchical nature of taxonomic classifications makes it difficult to work with. There are many R packages that use taxonomic data to varying degrees but there is currently no cross-package standard for how this information is encoded and manipulated. We developed the R package taxa to provide a robust and flexible solution to storing and manipulating taxonomic data in R and any application-specific information associated with it. Taxa provides parsers that can read common sources of taxonomic information (taxon IDs, sequence IDs, taxon names, and classifications) from nearly any format while preserving associated data. Once parsed, the taxonomic data and any associated data can be manipulated using a cohesive set of functions modeled after the popular R package dplyr. These functions take into account the hierarchical nature of taxa and can modify the taxonomy or associated data in such a way that both are kept in sync. Taxa is currently being used by the metacoder and taxize packages, which provide broadly useful functionality that we hope will speed adoption by users and developers.
format Online
Article
Text
id pubmed-5887078
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-58870782018-04-25 Taxa: An R package implementing data standards and methods for taxonomic data Foster, Zachary S.L. Chamberlain, Scott Grünwald, Niklaus J. F1000Res Software Tool Article The taxa R package provides a set of tools for defining and manipulating taxonomic data. The recent and widespread application of DNA sequencing to community composition studies is making large data sets with taxonomic information commonplace. However, compared to typical tabular data, this information is encoded in many different ways and the hierarchical nature of taxonomic classifications makes it difficult to work with. There are many R packages that use taxonomic data to varying degrees but there is currently no cross-package standard for how this information is encoded and manipulated. We developed the R package taxa to provide a robust and flexible solution to storing and manipulating taxonomic data in R and any application-specific information associated with it. Taxa provides parsers that can read common sources of taxonomic information (taxon IDs, sequence IDs, taxon names, and classifications) from nearly any format while preserving associated data. Once parsed, the taxonomic data and any associated data can be manipulated using a cohesive set of functions modeled after the popular R package dplyr. These functions take into account the hierarchical nature of taxa and can modify the taxonomy or associated data in such a way that both are kept in sync. Taxa is currently being used by the metacoder and taxize packages, which provide broadly useful functionality that we hope will speed adoption by users and developers. F1000 Research Limited 2018-09-11 /pmc/articles/PMC5887078/ /pubmed/29707201 http://dx.doi.org/10.12688/f1000research.14013.2 Text en Copyright: © 2018 Foster ZSL et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Tool Article
Foster, Zachary S.L.
Chamberlain, Scott
Grünwald, Niklaus J.
Taxa: An R package implementing data standards and methods for taxonomic data
title Taxa: An R package implementing data standards and methods for taxonomic data
title_full Taxa: An R package implementing data standards and methods for taxonomic data
title_fullStr Taxa: An R package implementing data standards and methods for taxonomic data
title_full_unstemmed Taxa: An R package implementing data standards and methods for taxonomic data
title_short Taxa: An R package implementing data standards and methods for taxonomic data
title_sort taxa: an r package implementing data standards and methods for taxonomic data
topic Software Tool Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5887078/
https://www.ncbi.nlm.nih.gov/pubmed/29707201
http://dx.doi.org/10.12688/f1000research.14013.2
work_keys_str_mv AT fosterzacharysl taxaanrpackageimplementingdatastandardsandmethodsfortaxonomicdata
AT chamberlainscott taxaanrpackageimplementingdatastandardsandmethodsfortaxonomicdata
AT grunwaldniklausj taxaanrpackageimplementingdatastandardsandmethodsfortaxonomicdata