Cargando…
Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes
BACKGROUND: Eukaryotes such as fungi and protists frequently accompany bacteria and archaea in microbial communities. Unfortunately, their presence is difficult to study with “shotgun” metagenomic sequencing since prokaryotic signals dominate in most environments. Recent methods for eukaryotic detec...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10084625/ https://www.ncbi.nlm.nih.gov/pubmed/37032329 http://dx.doi.org/10.1186/s40168-023-01505-1 |
_version_ | 1785021778240733184 |
---|---|
author | Bazant, Wojtek Blevins, Ann S. Crouch, Kathryn Beiting, Daniel P. |
author_facet | Bazant, Wojtek Blevins, Ann S. Crouch, Kathryn Beiting, Daniel P. |
author_sort | Bazant, Wojtek |
collection | PubMed |
description | BACKGROUND: Eukaryotes such as fungi and protists frequently accompany bacteria and archaea in microbial communities. Unfortunately, their presence is difficult to study with “shotgun” metagenomic sequencing since prokaryotic signals dominate in most environments. Recent methods for eukaryotic detection use eukaryote-specific marker genes, but they do not incorporate strategies to handle the presence of eukaryotes that are not represented in the reference marker gene set, and they are not compatible with web-based tools for downstream analysis. RESULTS: Here, we present CORRAL (for Clustering Of Related Reference ALignments), a tool for the identification of eukaryotes in shotgun metagenomic data based on alignments to eukaryote-specific marker genes and Markov clustering. Using a combination of simulated datasets, mock community standards, and large publicly available human microbiome studies, we demonstrate that our method is not only sensitive and accurate but is also capable of inferring the presence of eukaryotes not included in the marker gene reference, such as novel strains. Finally, we deploy CORRAL on our MicrobiomeDB.org resource, producing an atlas of eukaryotes present in various environments of the human body and linking their presence to study covariates. CONCLUSIONS: CORRAL allows eukaryotic detection to be automated and carried out at scale. Implementation of CORRAL in MicrobiomeDB.org creates a running atlas of microbial eukaryotes in metagenomic studies. Since our approach is independent of the reference used, it may be applicable to other contexts where shotgun metagenomic reads are matched against redundant but non-exhaustive databases, such as the identification of bacterial virulence genes or taxonomic classification of viral reads. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40168-023-01505-1. |
format | Online Article Text |
id | pubmed-10084625 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-100846252023-04-11 Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes Bazant, Wojtek Blevins, Ann S. Crouch, Kathryn Beiting, Daniel P. Microbiome Software BACKGROUND: Eukaryotes such as fungi and protists frequently accompany bacteria and archaea in microbial communities. Unfortunately, their presence is difficult to study with “shotgun” metagenomic sequencing since prokaryotic signals dominate in most environments. Recent methods for eukaryotic detection use eukaryote-specific marker genes, but they do not incorporate strategies to handle the presence of eukaryotes that are not represented in the reference marker gene set, and they are not compatible with web-based tools for downstream analysis. RESULTS: Here, we present CORRAL (for Clustering Of Related Reference ALignments), a tool for the identification of eukaryotes in shotgun metagenomic data based on alignments to eukaryote-specific marker genes and Markov clustering. Using a combination of simulated datasets, mock community standards, and large publicly available human microbiome studies, we demonstrate that our method is not only sensitive and accurate but is also capable of inferring the presence of eukaryotes not included in the marker gene reference, such as novel strains. Finally, we deploy CORRAL on our MicrobiomeDB.org resource, producing an atlas of eukaryotes present in various environments of the human body and linking their presence to study covariates. CONCLUSIONS: CORRAL allows eukaryotic detection to be automated and carried out at scale. Implementation of CORRAL in MicrobiomeDB.org creates a running atlas of microbial eukaryotes in metagenomic studies. Since our approach is independent of the reference used, it may be applicable to other contexts where shotgun metagenomic reads are matched against redundant but non-exhaustive databases, such as the identification of bacterial virulence genes or taxonomic classification of viral reads. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40168-023-01505-1. BioMed Central 2023-04-10 /pmc/articles/PMC10084625/ /pubmed/37032329 http://dx.doi.org/10.1186/s40168-023-01505-1 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Bazant, Wojtek Blevins, Ann S. Crouch, Kathryn Beiting, Daniel P. Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes |
title | Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes |
title_full | Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes |
title_fullStr | Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes |
title_full_unstemmed | Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes |
title_short | Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes |
title_sort | improved eukaryotic detection compatible with large-scale automated analysis of metagenomes |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10084625/ https://www.ncbi.nlm.nih.gov/pubmed/37032329 http://dx.doi.org/10.1186/s40168-023-01505-1 |
work_keys_str_mv | AT bazantwojtek improvedeukaryoticdetectioncompatiblewithlargescaleautomatedanalysisofmetagenomes AT blevinsanns improvedeukaryoticdetectioncompatiblewithlargescaleautomatedanalysisofmetagenomes AT crouchkathryn improvedeukaryoticdetectioncompatiblewithlargescaleautomatedanalysisofmetagenomes AT beitingdanielp improvedeukaryoticdetectioncompatiblewithlargescaleautomatedanalysisofmetagenomes |