Cargando…
Quantifying and Cataloguing Unknown Sequences within Human Microbiomes
Advances in genome sequencing technologies and lower costs have enabled the exploration of a multitude of known and novel environments and microbiomes. This has led to an exponential growth in the raw sequence data that are deposited in online repositories. Metagenomic and metatranscriptomic data se...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Society for Microbiology
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9052204/ https://www.ncbi.nlm.nih.gov/pubmed/35258340 http://dx.doi.org/10.1128/msystems.01468-21 |
_version_ | 1784696734858870784 |
---|---|
author | Modha, Sejal Robertson, David L. Hughes, Joseph Orton, Richard J. |
author_facet | Modha, Sejal Robertson, David L. Hughes, Joseph Orton, Richard J. |
author_sort | Modha, Sejal |
collection | PubMed |
description | Advances in genome sequencing technologies and lower costs have enabled the exploration of a multitude of known and novel environments and microbiomes. This has led to an exponential growth in the raw sequence data that are deposited in online repositories. Metagenomic and metatranscriptomic data sets are typically analysed with regard to a specific biological question. However, it is widely acknowledged that these data sets are comprised of a proportion of sequences that bear no similarity to any currently known biological sequence, and this so-called “dark matter” is often excluded from downstream analyses. In this study, a systematic framework was developed to assemble, identify, and measure the proportion of unknown sequences present in distinct human microbiomes. This framework was applied to 40 distinct studies, comprising 963 samples, and covering 10 different human microbiomes including fecal, oral, lung, skin, and circulatory system microbiomes. We found that while the human microbiome is one of the most extensively studied, on average 2% of assembled sequences have not yet been taxonomically defined. However, this proportion varied extensively among different microbiomes and was as high as 25% for skin and oral microbiomes that have more interactions with the environment. A rate of taxonomic characterization of 1.64% of unknown sequences being characterized per month was calculated from these taxonomically unknown sequences discovered in this study. A cross-study comparison led to the identification of similar unknown sequences in different samples and/or microbiomes. Both our computational framework and the novel unknown sequences produced are publicly available for future cross-referencing. Our approach led to the discovery of several novel viral genomes that bear no similarity to sequences in the public databases. Some of these are widespread as they have been found in different microbiomes and studies. Hence, our study illustrates how the systematic characterization of unknown sequences can help the discovery of novel microbes, and we call on the research community to systematically collate and share the unknown sequences from metagenomic studies to increase the rate at which the unknown sequence space can be classified. |
format | Online Article Text |
id | pubmed-9052204 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | American Society for Microbiology |
record_format | MEDLINE/PubMed |
spelling | pubmed-90522042022-04-30 Quantifying and Cataloguing Unknown Sequences within Human Microbiomes Modha, Sejal Robertson, David L. Hughes, Joseph Orton, Richard J. mSystems Research Article Advances in genome sequencing technologies and lower costs have enabled the exploration of a multitude of known and novel environments and microbiomes. This has led to an exponential growth in the raw sequence data that are deposited in online repositories. Metagenomic and metatranscriptomic data sets are typically analysed with regard to a specific biological question. However, it is widely acknowledged that these data sets are comprised of a proportion of sequences that bear no similarity to any currently known biological sequence, and this so-called “dark matter” is often excluded from downstream analyses. In this study, a systematic framework was developed to assemble, identify, and measure the proportion of unknown sequences present in distinct human microbiomes. This framework was applied to 40 distinct studies, comprising 963 samples, and covering 10 different human microbiomes including fecal, oral, lung, skin, and circulatory system microbiomes. We found that while the human microbiome is one of the most extensively studied, on average 2% of assembled sequences have not yet been taxonomically defined. However, this proportion varied extensively among different microbiomes and was as high as 25% for skin and oral microbiomes that have more interactions with the environment. A rate of taxonomic characterization of 1.64% of unknown sequences being characterized per month was calculated from these taxonomically unknown sequences discovered in this study. A cross-study comparison led to the identification of similar unknown sequences in different samples and/or microbiomes. Both our computational framework and the novel unknown sequences produced are publicly available for future cross-referencing. Our approach led to the discovery of several novel viral genomes that bear no similarity to sequences in the public databases. Some of these are widespread as they have been found in different microbiomes and studies. Hence, our study illustrates how the systematic characterization of unknown sequences can help the discovery of novel microbes, and we call on the research community to systematically collate and share the unknown sequences from metagenomic studies to increase the rate at which the unknown sequence space can be classified. American Society for Microbiology 2022-03-08 /pmc/articles/PMC9052204/ /pubmed/35258340 http://dx.doi.org/10.1128/msystems.01468-21 Text en Copyright © 2022 Modha et al. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Research Article Modha, Sejal Robertson, David L. Hughes, Joseph Orton, Richard J. Quantifying and Cataloguing Unknown Sequences within Human Microbiomes |
title | Quantifying and Cataloguing Unknown Sequences within Human Microbiomes |
title_full | Quantifying and Cataloguing Unknown Sequences within Human Microbiomes |
title_fullStr | Quantifying and Cataloguing Unknown Sequences within Human Microbiomes |
title_full_unstemmed | Quantifying and Cataloguing Unknown Sequences within Human Microbiomes |
title_short | Quantifying and Cataloguing Unknown Sequences within Human Microbiomes |
title_sort | quantifying and cataloguing unknown sequences within human microbiomes |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9052204/ https://www.ncbi.nlm.nih.gov/pubmed/35258340 http://dx.doi.org/10.1128/msystems.01468-21 |
work_keys_str_mv | AT modhasejal quantifyingandcataloguingunknownsequenceswithinhumanmicrobiomes AT robertsondavidl quantifyingandcataloguingunknownsequenceswithinhumanmicrobiomes AT hughesjoseph quantifyingandcataloguingunknownsequenceswithinhumanmicrobiomes AT ortonrichardj quantifyingandcataloguingunknownsequenceswithinhumanmicrobiomes |