Cargando…

Critical assessment of pan-genomic analysis of metagenome-assembled genomes

Pan-genome analyses of metagenome-assembled genomes (MAGs) may suffer from the known issues with MAGs: fragmentation, incompleteness and contamination. Here, we conducted a critical assessment of pan-genomics of MAGs, by comparing pan-genome analysis results of complete bacterial genomes and simulat...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Tang, Yin, Yanbin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9677465/
https://www.ncbi.nlm.nih.gov/pubmed/36124775
http://dx.doi.org/10.1093/bib/bbac413
_version_ 1784833816997658624
author Li, Tang
Yin, Yanbin
author_facet Li, Tang
Yin, Yanbin
author_sort Li, Tang
collection PubMed
description Pan-genome analyses of metagenome-assembled genomes (MAGs) may suffer from the known issues with MAGs: fragmentation, incompleteness and contamination. Here, we conducted a critical assessment of pan-genomics of MAGs, by comparing pan-genome analysis results of complete bacterial genomes and simulated MAGs. We found that incompleteness led to significant core gene (CG) loss. The CG loss remained when using different pan-genome analysis tools (Roary, BPGA, Anvi’o) and when using a mixture of MAGs and complete genomes. Contamination had little effect on core genome size (except for Roary due to in its gene clustering issue) but had major influence on accessory genomes. Importantly, the CG loss was partially alleviated by lowering the CG threshold and using gene prediction algorithms that consider fragmented genes, but to a less degree when incompleteness was higher than 5%. The CG loss also led to incorrect pan-genome functional predictions and inaccurate phylogenetic trees. Our main findings were supported by a study of real MAG-isolate genome data. We conclude that lowering CG threshold and predicting genes in metagenome mode (as Anvi’o does with Prodigal) are necessary in pan-genome analysis of MAGs. Development of new pan-genome analysis tools specifically for MAGs are needed in future studies.
format Online
Article
Text
id pubmed-9677465
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-96774652022-11-21 Critical assessment of pan-genomic analysis of metagenome-assembled genomes Li, Tang Yin, Yanbin Brief Bioinform Problem Solving Protocol Pan-genome analyses of metagenome-assembled genomes (MAGs) may suffer from the known issues with MAGs: fragmentation, incompleteness and contamination. Here, we conducted a critical assessment of pan-genomics of MAGs, by comparing pan-genome analysis results of complete bacterial genomes and simulated MAGs. We found that incompleteness led to significant core gene (CG) loss. The CG loss remained when using different pan-genome analysis tools (Roary, BPGA, Anvi’o) and when using a mixture of MAGs and complete genomes. Contamination had little effect on core genome size (except for Roary due to in its gene clustering issue) but had major influence on accessory genomes. Importantly, the CG loss was partially alleviated by lowering the CG threshold and using gene prediction algorithms that consider fragmented genes, but to a less degree when incompleteness was higher than 5%. The CG loss also led to incorrect pan-genome functional predictions and inaccurate phylogenetic trees. Our main findings were supported by a study of real MAG-isolate genome data. We conclude that lowering CG threshold and predicting genes in metagenome mode (as Anvi’o does with Prodigal) are necessary in pan-genome analysis of MAGs. Development of new pan-genome analysis tools specifically for MAGs are needed in future studies. Oxford University Press 2022-09-17 /pmc/articles/PMC9677465/ /pubmed/36124775 http://dx.doi.org/10.1093/bib/bbac413 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Problem Solving Protocol
Li, Tang
Yin, Yanbin
Critical assessment of pan-genomic analysis of metagenome-assembled genomes
title Critical assessment of pan-genomic analysis of metagenome-assembled genomes
title_full Critical assessment of pan-genomic analysis of metagenome-assembled genomes
title_fullStr Critical assessment of pan-genomic analysis of metagenome-assembled genomes
title_full_unstemmed Critical assessment of pan-genomic analysis of metagenome-assembled genomes
title_short Critical assessment of pan-genomic analysis of metagenome-assembled genomes
title_sort critical assessment of pan-genomic analysis of metagenome-assembled genomes
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9677465/
https://www.ncbi.nlm.nih.gov/pubmed/36124775
http://dx.doi.org/10.1093/bib/bbac413
work_keys_str_mv AT litang criticalassessmentofpangenomicanalysisofmetagenomeassembledgenomes
AT yinyanbin criticalassessmentofpangenomicanalysisofmetagenomeassembledgenomes