Cargando…

Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes

BACKGROUND: Eukaryotes such as fungi and protists frequently accompany bacteria and archaea in microbial communities. Unfortunately, their presence is difficult to study with “shotgun” metagenomic sequencing since prokaryotic signals dominate in most environments. Recent methods for eukaryotic detec...

Descripción completa

Detalles Bibliográficos
Autores principales: Bazant, Wojtek, Blevins, Ann S., Crouch, Kathryn, Beiting, Daniel P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10084625/
https://www.ncbi.nlm.nih.gov/pubmed/37032329
http://dx.doi.org/10.1186/s40168-023-01505-1
_version_ 1785021778240733184
author Bazant, Wojtek
Blevins, Ann S.
Crouch, Kathryn
Beiting, Daniel P.
author_facet Bazant, Wojtek
Blevins, Ann S.
Crouch, Kathryn
Beiting, Daniel P.
author_sort Bazant, Wojtek
collection PubMed
description BACKGROUND: Eukaryotes such as fungi and protists frequently accompany bacteria and archaea in microbial communities. Unfortunately, their presence is difficult to study with “shotgun” metagenomic sequencing since prokaryotic signals dominate in most environments. Recent methods for eukaryotic detection use eukaryote-specific marker genes, but they do not incorporate strategies to handle the presence of eukaryotes that are not represented in the reference marker gene set, and they are not compatible with web-based tools for downstream analysis. RESULTS: Here, we present CORRAL (for Clustering Of Related Reference ALignments), a tool for the identification of eukaryotes in shotgun metagenomic data based on alignments to eukaryote-specific marker genes and Markov clustering. Using a combination of simulated datasets, mock community standards, and large publicly available human microbiome studies, we demonstrate that our method is not only sensitive and accurate but is also capable of inferring the presence of eukaryotes not included in the marker gene reference, such as novel strains. Finally, we deploy CORRAL on our MicrobiomeDB.org resource, producing an atlas of eukaryotes present in various environments of the human body and linking their presence to study covariates. CONCLUSIONS: CORRAL allows eukaryotic detection to be automated and carried out at scale. Implementation of CORRAL in MicrobiomeDB.org creates a running atlas of microbial eukaryotes in metagenomic studies. Since our approach is independent of the reference used, it may be applicable to other contexts where shotgun metagenomic reads are matched against redundant but non-exhaustive databases, such as the identification of bacterial virulence genes or taxonomic classification of viral reads. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40168-023-01505-1.
format Online
Article
Text
id pubmed-10084625
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-100846252023-04-11 Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes Bazant, Wojtek Blevins, Ann S. Crouch, Kathryn Beiting, Daniel P. Microbiome Software BACKGROUND: Eukaryotes such as fungi and protists frequently accompany bacteria and archaea in microbial communities. Unfortunately, their presence is difficult to study with “shotgun” metagenomic sequencing since prokaryotic signals dominate in most environments. Recent methods for eukaryotic detection use eukaryote-specific marker genes, but they do not incorporate strategies to handle the presence of eukaryotes that are not represented in the reference marker gene set, and they are not compatible with web-based tools for downstream analysis. RESULTS: Here, we present CORRAL (for Clustering Of Related Reference ALignments), a tool for the identification of eukaryotes in shotgun metagenomic data based on alignments to eukaryote-specific marker genes and Markov clustering. Using a combination of simulated datasets, mock community standards, and large publicly available human microbiome studies, we demonstrate that our method is not only sensitive and accurate but is also capable of inferring the presence of eukaryotes not included in the marker gene reference, such as novel strains. Finally, we deploy CORRAL on our MicrobiomeDB.org resource, producing an atlas of eukaryotes present in various environments of the human body and linking their presence to study covariates. CONCLUSIONS: CORRAL allows eukaryotic detection to be automated and carried out at scale. Implementation of CORRAL in MicrobiomeDB.org creates a running atlas of microbial eukaryotes in metagenomic studies. Since our approach is independent of the reference used, it may be applicable to other contexts where shotgun metagenomic reads are matched against redundant but non-exhaustive databases, such as the identification of bacterial virulence genes or taxonomic classification of viral reads. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40168-023-01505-1. BioMed Central 2023-04-10 /pmc/articles/PMC10084625/ /pubmed/37032329 http://dx.doi.org/10.1186/s40168-023-01505-1 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Bazant, Wojtek
Blevins, Ann S.
Crouch, Kathryn
Beiting, Daniel P.
Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes
title Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes
title_full Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes
title_fullStr Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes
title_full_unstemmed Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes
title_short Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes
title_sort improved eukaryotic detection compatible with large-scale automated analysis of metagenomes
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10084625/
https://www.ncbi.nlm.nih.gov/pubmed/37032329
http://dx.doi.org/10.1186/s40168-023-01505-1
work_keys_str_mv AT bazantwojtek improvedeukaryoticdetectioncompatiblewithlargescaleautomatedanalysisofmetagenomes
AT blevinsanns improvedeukaryoticdetectioncompatiblewithlargescaleautomatedanalysisofmetagenomes
AT crouchkathryn improvedeukaryoticdetectioncompatiblewithlargescaleautomatedanalysisofmetagenomes
AT beitingdanielp improvedeukaryoticdetectioncompatiblewithlargescaleautomatedanalysisofmetagenomes