Cargando…

Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes

Increasingly available microbial reference data allow interpreting the composition and function of previously uncharacterized microbial communities in detail, via high-throughput sequencing analysis. However, efficient methods for read classification are required when the best database matches for s...

Descripción completa

Detalles Bibliográficos
Autores principales: Utro, Filippo, Haiminen, Niina, Siragusa, Enrico, Gardiner, Laura-Jayne, Seabolt, Ed, Krishna, Ritesh, Kaufman, James H., Parida, Laxmi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7125348/
https://www.ncbi.nlm.nih.gov/pubmed/32248063
http://dx.doi.org/10.1016/j.isci.2020.100988
_version_ 1783515926233088000
author Utro, Filippo
Haiminen, Niina
Siragusa, Enrico
Gardiner, Laura-Jayne
Seabolt, Ed
Krishna, Ritesh
Kaufman, James H.
Parida, Laxmi
author_facet Utro, Filippo
Haiminen, Niina
Siragusa, Enrico
Gardiner, Laura-Jayne
Seabolt, Ed
Krishna, Ritesh
Kaufman, James H.
Parida, Laxmi
author_sort Utro, Filippo
collection PubMed
description Increasingly available microbial reference data allow interpreting the composition and function of previously uncharacterized microbial communities in detail, via high-throughput sequencing analysis. However, efficient methods for read classification are required when the best database matches for short sequence reads are often shared among multiple reference sequences. Here, we take advantage of the fact that microbial sequences can be annotated relative to established tree structures, and we develop a highly scalable read classifier, PRROMenade, by enhancing the generalized Burrows-Wheeler transform with a labeling step to directly assign reads to the corresponding lowest taxonomic unit in an annotation tree. PRROMenade solves the multi-matching problem while allowing fast variable-size sequence classification for phylogenetic or functional annotation. Our simulations with 5% added differences from reference indicated only 1.5% error rate for PRROMenade functional classification. On metatranscriptomic data PRROMenade highlighted biologically relevant functional pathways related to diet-induced changes in the human gut microbiome.
format Online
Article
Text
id pubmed-7125348
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-71253482020-04-06 Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes Utro, Filippo Haiminen, Niina Siragusa, Enrico Gardiner, Laura-Jayne Seabolt, Ed Krishna, Ritesh Kaufman, James H. Parida, Laxmi iScience Article Increasingly available microbial reference data allow interpreting the composition and function of previously uncharacterized microbial communities in detail, via high-throughput sequencing analysis. However, efficient methods for read classification are required when the best database matches for short sequence reads are often shared among multiple reference sequences. Here, we take advantage of the fact that microbial sequences can be annotated relative to established tree structures, and we develop a highly scalable read classifier, PRROMenade, by enhancing the generalized Burrows-Wheeler transform with a labeling step to directly assign reads to the corresponding lowest taxonomic unit in an annotation tree. PRROMenade solves the multi-matching problem while allowing fast variable-size sequence classification for phylogenetic or functional annotation. Our simulations with 5% added differences from reference indicated only 1.5% error rate for PRROMenade functional classification. On metatranscriptomic data PRROMenade highlighted biologically relevant functional pathways related to diet-induced changes in the human gut microbiome. Elsevier 2020-03-17 /pmc/articles/PMC7125348/ /pubmed/32248063 http://dx.doi.org/10.1016/j.isci.2020.100988 Text en © 2020 The Author(s) http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Utro, Filippo
Haiminen, Niina
Siragusa, Enrico
Gardiner, Laura-Jayne
Seabolt, Ed
Krishna, Ritesh
Kaufman, James H.
Parida, Laxmi
Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes
title Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes
title_full Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes
title_fullStr Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes
title_full_unstemmed Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes
title_short Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes
title_sort hierarchically labeled database indexing allows scalable characterization of microbiomes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7125348/
https://www.ncbi.nlm.nih.gov/pubmed/32248063
http://dx.doi.org/10.1016/j.isci.2020.100988
work_keys_str_mv AT utrofilippo hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes
AT haiminenniina hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes
AT siragusaenrico hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes
AT gardinerlaurajayne hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes
AT seabolted hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes
AT krishnaritesh hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes
AT kaufmanjamesh hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes
AT paridalaxmi hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes