Cargando…

mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation

Recent advances in sequencing and bioinformatics have expanded the tree of life by providing genomes for uncultured environmentally relevant clades, either through metagenome-assembled genomes or through single-cell genomes. While this expanded diversity can provide novel insights into microbial pop...

Descripción completa

Detalles Bibliográficos
Autores principales: Buck, Moritz, Mehrshad, Maliheh, Bertilsson, Stefan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9376867/
https://www.ncbi.nlm.nih.gov/pubmed/35979445
http://dx.doi.org/10.1093/nargab/lqac060
_version_ 1784768225093877760
author Buck, Moritz
Mehrshad, Maliheh
Bertilsson, Stefan
author_facet Buck, Moritz
Mehrshad, Maliheh
Bertilsson, Stefan
author_sort Buck, Moritz
collection PubMed
description Recent advances in sequencing and bioinformatics have expanded the tree of life by providing genomes for uncultured environmentally relevant clades, either through metagenome-assembled genomes or through single-cell genomes. While this expanded diversity can provide novel insights into microbial population structure, most tools available for core-genome estimation are sensitive to genome completeness. Consequently, a major portion of the huge phylogenetic diversity uncovered by environmental genomic approaches remains excluded from such analyses. We present mOTUpan, a novel iterative Bayesian method for computing the core genome for sets of genomes of highly diverse completeness range. The likelihood for each gene cluster to belong to core or accessory genome is estimated by computing the probability of its presence/absence pattern in the target genome set. The core-genome prediction is computationally efficient and can be scaled up to thousands of genomes. It has shown comparable estimates to state-of-the-art tools Roary and PPanGGOLiN for high-quality genomes and is capable of using genomes at lower completeness thresholds. mOTUpan wraps a bootstrapping procedure to estimate the quality of a specific core-genome prediction, as the accuracy of each run will depend on the specific completeness distribution and the number of genomes in the dataset under scrutiny. mOTUpan is implemented in the mOTUlizer software package, and available at github.com/moritzbuck/mOTUlizer, under GPL 3.0 license.
format Online
Article
Text
id pubmed-9376867
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-93768672022-08-16 mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation Buck, Moritz Mehrshad, Maliheh Bertilsson, Stefan NAR Genom Bioinform Methods Article Recent advances in sequencing and bioinformatics have expanded the tree of life by providing genomes for uncultured environmentally relevant clades, either through metagenome-assembled genomes or through single-cell genomes. While this expanded diversity can provide novel insights into microbial population structure, most tools available for core-genome estimation are sensitive to genome completeness. Consequently, a major portion of the huge phylogenetic diversity uncovered by environmental genomic approaches remains excluded from such analyses. We present mOTUpan, a novel iterative Bayesian method for computing the core genome for sets of genomes of highly diverse completeness range. The likelihood for each gene cluster to belong to core or accessory genome is estimated by computing the probability of its presence/absence pattern in the target genome set. The core-genome prediction is computationally efficient and can be scaled up to thousands of genomes. It has shown comparable estimates to state-of-the-art tools Roary and PPanGGOLiN for high-quality genomes and is capable of using genomes at lower completeness thresholds. mOTUpan wraps a bootstrapping procedure to estimate the quality of a specific core-genome prediction, as the accuracy of each run will depend on the specific completeness distribution and the number of genomes in the dataset under scrutiny. mOTUpan is implemented in the mOTUlizer software package, and available at github.com/moritzbuck/mOTUlizer, under GPL 3.0 license. Oxford University Press 2022-08-15 /pmc/articles/PMC9376867/ /pubmed/35979445 http://dx.doi.org/10.1093/nargab/lqac060 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Article
Buck, Moritz
Mehrshad, Maliheh
Bertilsson, Stefan
mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation
title mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation
title_full mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation
title_fullStr mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation
title_full_unstemmed mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation
title_short mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation
title_sort motupan: a robust bayesian approach to leverage metagenome-assembled genomes for core-genome estimation
topic Methods Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9376867/
https://www.ncbi.nlm.nih.gov/pubmed/35979445
http://dx.doi.org/10.1093/nargab/lqac060
work_keys_str_mv AT buckmoritz motupanarobustbayesianapproachtoleveragemetagenomeassembledgenomesforcoregenomeestimation
AT mehrshadmaliheh motupanarobustbayesianapproachtoleveragemetagenomeassembledgenomesforcoregenomeestimation
AT bertilssonstefan motupanarobustbayesianapproachtoleveragemetagenomeassembledgenomesforcoregenomeestimation