Cargando…

Minimum entropy decomposition: Unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences

Molecular microbial ecology investigations often employ large marker gene datasets, for example, ribosomal RNAs, to represent the occurrence of single-cell genomes in microbial communities. Massively parallel DNA sequencing technologies enable extensive surveys of marker gene libraries that sometime...

Descripción completa

Detalles Bibliográficos
Autores principales: Eren, A Murat, Morrison, Hilary G, Lescault, Pamela J, Reveillaud, Julie, Vineis, Joseph H, Sogin, Mitchell L
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4817710/
https://www.ncbi.nlm.nih.gov/pubmed/25325381
http://dx.doi.org/10.1038/ismej.2014.195
_version_ 1782424927222104064
author Eren, A Murat
Morrison, Hilary G
Lescault, Pamela J
Reveillaud, Julie
Vineis, Joseph H
Sogin, Mitchell L
author_facet Eren, A Murat
Morrison, Hilary G
Lescault, Pamela J
Reveillaud, Julie
Vineis, Joseph H
Sogin, Mitchell L
author_sort Eren, A Murat
collection PubMed
description Molecular microbial ecology investigations often employ large marker gene datasets, for example, ribosomal RNAs, to represent the occurrence of single-cell genomes in microbial communities. Massively parallel DNA sequencing technologies enable extensive surveys of marker gene libraries that sometimes include nearly identical sequences. Computational approaches that rely on pairwise sequence alignments for similarity assessment and de novo clustering with de facto similarity thresholds to partition high-throughput sequencing datasets constrain fine-scale resolution descriptions of microbial communities. Minimum Entropy Decomposition (MED) provides a computationally efficient means to partition marker gene datasets into ‘MED nodes', which represent homogeneous operational taxonomic units. By employing Shannon entropy, MED uses only the information-rich nucleotide positions across reads and iteratively partitions large datasets while omitting stochastic variation. When applied to analyses of microbiomes from two deep-sea cryptic sponges Hexadella dedritifera and Hexadella cf. dedritifera, MED resolved a key Gammaproteobacteria cluster into multiple MED nodes that are specific to different sponges, and revealed that these closely related sympatric sponge species maintain distinct microbial communities. MED analysis of a previously published human oral microbiome dataset also revealed that taxa separated by less than 1% sequence variation distributed to distinct niches in the oral cavity. The information theory-guided decomposition process behind the MED algorithm enables sensitive discrimination of closely related organisms in marker gene amplicon datasets without relying on extensive computational heuristics and user supervision.
format Online
Article
Text
id pubmed-4817710
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-48177102016-04-15 Minimum entropy decomposition: Unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences Eren, A Murat Morrison, Hilary G Lescault, Pamela J Reveillaud, Julie Vineis, Joseph H Sogin, Mitchell L ISME J Original Article Molecular microbial ecology investigations often employ large marker gene datasets, for example, ribosomal RNAs, to represent the occurrence of single-cell genomes in microbial communities. Massively parallel DNA sequencing technologies enable extensive surveys of marker gene libraries that sometimes include nearly identical sequences. Computational approaches that rely on pairwise sequence alignments for similarity assessment and de novo clustering with de facto similarity thresholds to partition high-throughput sequencing datasets constrain fine-scale resolution descriptions of microbial communities. Minimum Entropy Decomposition (MED) provides a computationally efficient means to partition marker gene datasets into ‘MED nodes', which represent homogeneous operational taxonomic units. By employing Shannon entropy, MED uses only the information-rich nucleotide positions across reads and iteratively partitions large datasets while omitting stochastic variation. When applied to analyses of microbiomes from two deep-sea cryptic sponges Hexadella dedritifera and Hexadella cf. dedritifera, MED resolved a key Gammaproteobacteria cluster into multiple MED nodes that are specific to different sponges, and revealed that these closely related sympatric sponge species maintain distinct microbial communities. MED analysis of a previously published human oral microbiome dataset also revealed that taxa separated by less than 1% sequence variation distributed to distinct niches in the oral cavity. The information theory-guided decomposition process behind the MED algorithm enables sensitive discrimination of closely related organisms in marker gene amplicon datasets without relying on extensive computational heuristics and user supervision. Nature Publishing Group 2015-04 2014-10-17 /pmc/articles/PMC4817710/ /pubmed/25325381 http://dx.doi.org/10.1038/ismej.2014.195 Text en Copyright © 2015 International Society for Microbial Ecology http://creativecommons.org/licenses/by/3.0/ This work is licensed under a Creative Commons Attribution 3.0 Unported License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/
spellingShingle Original Article
Eren, A Murat
Morrison, Hilary G
Lescault, Pamela J
Reveillaud, Julie
Vineis, Joseph H
Sogin, Mitchell L
Minimum entropy decomposition: Unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences
title Minimum entropy decomposition: Unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences
title_full Minimum entropy decomposition: Unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences
title_fullStr Minimum entropy decomposition: Unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences
title_full_unstemmed Minimum entropy decomposition: Unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences
title_short Minimum entropy decomposition: Unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences
title_sort minimum entropy decomposition: unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4817710/
https://www.ncbi.nlm.nih.gov/pubmed/25325381
http://dx.doi.org/10.1038/ismej.2014.195
work_keys_str_mv AT erenamurat minimumentropydecompositionunsupervisedoligotypingforsensitivepartitioningofhighthroughputmarkergenesequences
AT morrisonhilaryg minimumentropydecompositionunsupervisedoligotypingforsensitivepartitioningofhighthroughputmarkergenesequences
AT lescaultpamelaj minimumentropydecompositionunsupervisedoligotypingforsensitivepartitioningofhighthroughputmarkergenesequences
AT reveillaudjulie minimumentropydecompositionunsupervisedoligotypingforsensitivepartitioningofhighthroughputmarkergenesequences
AT vineisjosephh minimumentropydecompositionunsupervisedoligotypingforsensitivepartitioningofhighthroughputmarkergenesequences
AT soginmitchelll minimumentropydecompositionunsupervisedoligotypingforsensitivepartitioningofhighthroughputmarkergenesequences