Cargando…
mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation
Recent advances in sequencing and bioinformatics have expanded the tree of life by providing genomes for uncultured environmentally relevant clades, either through metagenome-assembled genomes or through single-cell genomes. While this expanded diversity can provide novel insights into microbial pop...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9376867/ https://www.ncbi.nlm.nih.gov/pubmed/35979445 http://dx.doi.org/10.1093/nargab/lqac060 |
_version_ | 1784768225093877760 |
---|---|
author | Buck, Moritz Mehrshad, Maliheh Bertilsson, Stefan |
author_facet | Buck, Moritz Mehrshad, Maliheh Bertilsson, Stefan |
author_sort | Buck, Moritz |
collection | PubMed |
description | Recent advances in sequencing and bioinformatics have expanded the tree of life by providing genomes for uncultured environmentally relevant clades, either through metagenome-assembled genomes or through single-cell genomes. While this expanded diversity can provide novel insights into microbial population structure, most tools available for core-genome estimation are sensitive to genome completeness. Consequently, a major portion of the huge phylogenetic diversity uncovered by environmental genomic approaches remains excluded from such analyses. We present mOTUpan, a novel iterative Bayesian method for computing the core genome for sets of genomes of highly diverse completeness range. The likelihood for each gene cluster to belong to core or accessory genome is estimated by computing the probability of its presence/absence pattern in the target genome set. The core-genome prediction is computationally efficient and can be scaled up to thousands of genomes. It has shown comparable estimates to state-of-the-art tools Roary and PPanGGOLiN for high-quality genomes and is capable of using genomes at lower completeness thresholds. mOTUpan wraps a bootstrapping procedure to estimate the quality of a specific core-genome prediction, as the accuracy of each run will depend on the specific completeness distribution and the number of genomes in the dataset under scrutiny. mOTUpan is implemented in the mOTUlizer software package, and available at github.com/moritzbuck/mOTUlizer, under GPL 3.0 license. |
format | Online Article Text |
id | pubmed-9376867 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-93768672022-08-16 mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation Buck, Moritz Mehrshad, Maliheh Bertilsson, Stefan NAR Genom Bioinform Methods Article Recent advances in sequencing and bioinformatics have expanded the tree of life by providing genomes for uncultured environmentally relevant clades, either through metagenome-assembled genomes or through single-cell genomes. While this expanded diversity can provide novel insights into microbial population structure, most tools available for core-genome estimation are sensitive to genome completeness. Consequently, a major portion of the huge phylogenetic diversity uncovered by environmental genomic approaches remains excluded from such analyses. We present mOTUpan, a novel iterative Bayesian method for computing the core genome for sets of genomes of highly diverse completeness range. The likelihood for each gene cluster to belong to core or accessory genome is estimated by computing the probability of its presence/absence pattern in the target genome set. The core-genome prediction is computationally efficient and can be scaled up to thousands of genomes. It has shown comparable estimates to state-of-the-art tools Roary and PPanGGOLiN for high-quality genomes and is capable of using genomes at lower completeness thresholds. mOTUpan wraps a bootstrapping procedure to estimate the quality of a specific core-genome prediction, as the accuracy of each run will depend on the specific completeness distribution and the number of genomes in the dataset under scrutiny. mOTUpan is implemented in the mOTUlizer software package, and available at github.com/moritzbuck/mOTUlizer, under GPL 3.0 license. Oxford University Press 2022-08-15 /pmc/articles/PMC9376867/ /pubmed/35979445 http://dx.doi.org/10.1093/nargab/lqac060 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Article Buck, Moritz Mehrshad, Maliheh Bertilsson, Stefan mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation |
title | mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation |
title_full | mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation |
title_fullStr | mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation |
title_full_unstemmed | mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation |
title_short | mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation |
title_sort | motupan: a robust bayesian approach to leverage metagenome-assembled genomes for core-genome estimation |
topic | Methods Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9376867/ https://www.ncbi.nlm.nih.gov/pubmed/35979445 http://dx.doi.org/10.1093/nargab/lqac060 |
work_keys_str_mv | AT buckmoritz motupanarobustbayesianapproachtoleveragemetagenomeassembledgenomesforcoregenomeestimation AT mehrshadmaliheh motupanarobustbayesianapproachtoleveragemetagenomeassembledgenomesforcoregenomeestimation AT bertilssonstefan motupanarobustbayesianapproachtoleveragemetagenomeassembledgenomesforcoregenomeestimation |