Cargando…

Embracing Ambiguity in the Taxonomic Classification of Microbiome Sequencing Data

The advent of high throughput sequencing has enabled in-depth characterization of human and environmental microbiomes. Determining the taxonomic origin of microbial sequences is one of the first, and frequently only, analysis performed on microbiome samples. Substantial research has focused on the d...

Descripción completa

Detalles Bibliográficos
Autores principales: Shah, Nidhi, Meisel, Jacquelyn S., Pop, Mihai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6811648/
https://www.ncbi.nlm.nih.gov/pubmed/31681437
http://dx.doi.org/10.3389/fgene.2019.01022
_version_ 1783462510385430528
author Shah, Nidhi
Meisel, Jacquelyn S.
Pop, Mihai
author_facet Shah, Nidhi
Meisel, Jacquelyn S.
Pop, Mihai
author_sort Shah, Nidhi
collection PubMed
description The advent of high throughput sequencing has enabled in-depth characterization of human and environmental microbiomes. Determining the taxonomic origin of microbial sequences is one of the first, and frequently only, analysis performed on microbiome samples. Substantial research has focused on the development of methods for taxonomic annotation, often making trade-offs in computational efficiency and classification accuracy. A side-effect of these efforts has been a reexamination of the bacterial taxonomy itself. Taxonomies developed prior to the genomic revolution captured complex relationships between organisms that went beyond uniform taxonomic levels such as species, genus, and family. Driven in part by the need to simplify computational workflows, the bacterial taxonomies used most commonly today have been regularized to fit within a standard seven taxonomic levels. Consequently, modern analyses of microbial communities are relatively coarse-grained. Few methods make classifications below the genus level, impacting our ability to capture biologically relevant signals. Here, we present ATLAS, a novel strategy for taxonomic annotation that uses significant outliers within database search results to group sequences in the database into partitions. These partitions capture the extent of taxonomic ambiguity within the classification of a sample. The ATLAS pipeline can be found on GitHub [https://github.com/shahnidhi/outlier_in_BLAST_hits]. We demonstrate that ATLAS provides similar annotations to phylogenetic placement methods, but with higher computational efficiency. When applied to human microbiome data, ATLAS is able to identify previously characterized taxonomic groupings, such as those in the class Clostridia and the genus Bacillus. Furthermore, the majority of partitions identified by ATLAS are at the subgenus level, replacing higher-level annotations with specific groups of species. These more precise partitions improve our detection power in determining differential abundance in microbiome association studies.
format Online
Article
Text
id pubmed-6811648
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-68116482019-11-01 Embracing Ambiguity in the Taxonomic Classification of Microbiome Sequencing Data Shah, Nidhi Meisel, Jacquelyn S. Pop, Mihai Front Genet Genetics The advent of high throughput sequencing has enabled in-depth characterization of human and environmental microbiomes. Determining the taxonomic origin of microbial sequences is one of the first, and frequently only, analysis performed on microbiome samples. Substantial research has focused on the development of methods for taxonomic annotation, often making trade-offs in computational efficiency and classification accuracy. A side-effect of these efforts has been a reexamination of the bacterial taxonomy itself. Taxonomies developed prior to the genomic revolution captured complex relationships between organisms that went beyond uniform taxonomic levels such as species, genus, and family. Driven in part by the need to simplify computational workflows, the bacterial taxonomies used most commonly today have been regularized to fit within a standard seven taxonomic levels. Consequently, modern analyses of microbial communities are relatively coarse-grained. Few methods make classifications below the genus level, impacting our ability to capture biologically relevant signals. Here, we present ATLAS, a novel strategy for taxonomic annotation that uses significant outliers within database search results to group sequences in the database into partitions. These partitions capture the extent of taxonomic ambiguity within the classification of a sample. The ATLAS pipeline can be found on GitHub [https://github.com/shahnidhi/outlier_in_BLAST_hits]. We demonstrate that ATLAS provides similar annotations to phylogenetic placement methods, but with higher computational efficiency. When applied to human microbiome data, ATLAS is able to identify previously characterized taxonomic groupings, such as those in the class Clostridia and the genus Bacillus. Furthermore, the majority of partitions identified by ATLAS are at the subgenus level, replacing higher-level annotations with specific groups of species. These more precise partitions improve our detection power in determining differential abundance in microbiome association studies. Frontiers Media S.A. 2019-10-17 /pmc/articles/PMC6811648/ /pubmed/31681437 http://dx.doi.org/10.3389/fgene.2019.01022 Text en Copyright © 2019 Shah, Meisel and Pop http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Shah, Nidhi
Meisel, Jacquelyn S.
Pop, Mihai
Embracing Ambiguity in the Taxonomic Classification of Microbiome Sequencing Data
title Embracing Ambiguity in the Taxonomic Classification of Microbiome Sequencing Data
title_full Embracing Ambiguity in the Taxonomic Classification of Microbiome Sequencing Data
title_fullStr Embracing Ambiguity in the Taxonomic Classification of Microbiome Sequencing Data
title_full_unstemmed Embracing Ambiguity in the Taxonomic Classification of Microbiome Sequencing Data
title_short Embracing Ambiguity in the Taxonomic Classification of Microbiome Sequencing Data
title_sort embracing ambiguity in the taxonomic classification of microbiome sequencing data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6811648/
https://www.ncbi.nlm.nih.gov/pubmed/31681437
http://dx.doi.org/10.3389/fgene.2019.01022
work_keys_str_mv AT shahnidhi embracingambiguityinthetaxonomicclassificationofmicrobiomesequencingdata
AT meiseljacquelyns embracingambiguityinthetaxonomicclassificationofmicrobiomesequencingdata
AT popmihai embracingambiguityinthetaxonomicclassificationofmicrobiomesequencingdata