Cargando…

Quantifying and Cataloguing Unknown Sequences within Human Microbiomes

Advances in genome sequencing technologies and lower costs have enabled the exploration of a multitude of known and novel environments and microbiomes. This has led to an exponential growth in the raw sequence data that are deposited in online repositories. Metagenomic and metatranscriptomic data se...

Descripción completa

Detalles Bibliográficos
Autores principales: Modha, Sejal, Robertson, David L., Hughes, Joseph, Orton, Richard J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9052204/
https://www.ncbi.nlm.nih.gov/pubmed/35258340
http://dx.doi.org/10.1128/msystems.01468-21
_version_ 1784696734858870784
author Modha, Sejal
Robertson, David L.
Hughes, Joseph
Orton, Richard J.
author_facet Modha, Sejal
Robertson, David L.
Hughes, Joseph
Orton, Richard J.
author_sort Modha, Sejal
collection PubMed
description Advances in genome sequencing technologies and lower costs have enabled the exploration of a multitude of known and novel environments and microbiomes. This has led to an exponential growth in the raw sequence data that are deposited in online repositories. Metagenomic and metatranscriptomic data sets are typically analysed with regard to a specific biological question. However, it is widely acknowledged that these data sets are comprised of a proportion of sequences that bear no similarity to any currently known biological sequence, and this so-called “dark matter” is often excluded from downstream analyses. In this study, a systematic framework was developed to assemble, identify, and measure the proportion of unknown sequences present in distinct human microbiomes. This framework was applied to 40 distinct studies, comprising 963 samples, and covering 10 different human microbiomes including fecal, oral, lung, skin, and circulatory system microbiomes. We found that while the human microbiome is one of the most extensively studied, on average 2% of assembled sequences have not yet been taxonomically defined. However, this proportion varied extensively among different microbiomes and was as high as 25% for skin and oral microbiomes that have more interactions with the environment. A rate of taxonomic characterization of 1.64% of unknown sequences being characterized per month was calculated from these taxonomically unknown sequences discovered in this study. A cross-study comparison led to the identification of similar unknown sequences in different samples and/or microbiomes. Both our computational framework and the novel unknown sequences produced are publicly available for future cross-referencing. Our approach led to the discovery of several novel viral genomes that bear no similarity to sequences in the public databases. Some of these are widespread as they have been found in different microbiomes and studies. Hence, our study illustrates how the systematic characterization of unknown sequences can help the discovery of novel microbes, and we call on the research community to systematically collate and share the unknown sequences from metagenomic studies to increase the rate at which the unknown sequence space can be classified.
format Online
Article
Text
id pubmed-9052204
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-90522042022-04-30 Quantifying and Cataloguing Unknown Sequences within Human Microbiomes Modha, Sejal Robertson, David L. Hughes, Joseph Orton, Richard J. mSystems Research Article Advances in genome sequencing technologies and lower costs have enabled the exploration of a multitude of known and novel environments and microbiomes. This has led to an exponential growth in the raw sequence data that are deposited in online repositories. Metagenomic and metatranscriptomic data sets are typically analysed with regard to a specific biological question. However, it is widely acknowledged that these data sets are comprised of a proportion of sequences that bear no similarity to any currently known biological sequence, and this so-called “dark matter” is often excluded from downstream analyses. In this study, a systematic framework was developed to assemble, identify, and measure the proportion of unknown sequences present in distinct human microbiomes. This framework was applied to 40 distinct studies, comprising 963 samples, and covering 10 different human microbiomes including fecal, oral, lung, skin, and circulatory system microbiomes. We found that while the human microbiome is one of the most extensively studied, on average 2% of assembled sequences have not yet been taxonomically defined. However, this proportion varied extensively among different microbiomes and was as high as 25% for skin and oral microbiomes that have more interactions with the environment. A rate of taxonomic characterization of 1.64% of unknown sequences being characterized per month was calculated from these taxonomically unknown sequences discovered in this study. A cross-study comparison led to the identification of similar unknown sequences in different samples and/or microbiomes. Both our computational framework and the novel unknown sequences produced are publicly available for future cross-referencing. Our approach led to the discovery of several novel viral genomes that bear no similarity to sequences in the public databases. Some of these are widespread as they have been found in different microbiomes and studies. Hence, our study illustrates how the systematic characterization of unknown sequences can help the discovery of novel microbes, and we call on the research community to systematically collate and share the unknown sequences from metagenomic studies to increase the rate at which the unknown sequence space can be classified. American Society for Microbiology 2022-03-08 /pmc/articles/PMC9052204/ /pubmed/35258340 http://dx.doi.org/10.1128/msystems.01468-21 Text en Copyright © 2022 Modha et al. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Modha, Sejal
Robertson, David L.
Hughes, Joseph
Orton, Richard J.
Quantifying and Cataloguing Unknown Sequences within Human Microbiomes
title Quantifying and Cataloguing Unknown Sequences within Human Microbiomes
title_full Quantifying and Cataloguing Unknown Sequences within Human Microbiomes
title_fullStr Quantifying and Cataloguing Unknown Sequences within Human Microbiomes
title_full_unstemmed Quantifying and Cataloguing Unknown Sequences within Human Microbiomes
title_short Quantifying and Cataloguing Unknown Sequences within Human Microbiomes
title_sort quantifying and cataloguing unknown sequences within human microbiomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9052204/
https://www.ncbi.nlm.nih.gov/pubmed/35258340
http://dx.doi.org/10.1128/msystems.01468-21
work_keys_str_mv AT modhasejal quantifyingandcataloguingunknownsequenceswithinhumanmicrobiomes
AT robertsondavidl quantifyingandcataloguingunknownsequenceswithinhumanmicrobiomes
AT hughesjoseph quantifyingandcataloguingunknownsequenceswithinhumanmicrobiomes
AT ortonrichardj quantifyingandcataloguingunknownsequenceswithinhumanmicrobiomes