Cargando…

Comparison of gene clustering criteria reveals intrinsic uncertainty in pangenome analyses

BACKGROUND: A key step for comparative genomics is to group open reading frames into functionally and evolutionarily meaningful gene clusters. Gene clustering is complicated by intraspecific duplications and horizontal gene transfers that are frequent in prokaryotes. In consequence, gene clustering...

Descripción completa

Detalles Bibliográficos
Autores principales: Manzano-Morales, Saioa, Liu, Yang, González-Bodí, Sara, Huerta-Cepas, Jaime, Iranzo, Jaime
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10614367/
https://www.ncbi.nlm.nih.gov/pubmed/37904249
http://dx.doi.org/10.1186/s13059-023-03089-3
_version_ 1785129014380199936
author Manzano-Morales, Saioa
Liu, Yang
González-Bodí, Sara
Huerta-Cepas, Jaime
Iranzo, Jaime
author_facet Manzano-Morales, Saioa
Liu, Yang
González-Bodí, Sara
Huerta-Cepas, Jaime
Iranzo, Jaime
author_sort Manzano-Morales, Saioa
collection PubMed
description BACKGROUND: A key step for comparative genomics is to group open reading frames into functionally and evolutionarily meaningful gene clusters. Gene clustering is complicated by intraspecific duplications and horizontal gene transfers that are frequent in prokaryotes. In consequence, gene clustering methods must deal with a trade-off between identifying vertically transmitted representatives of multicopy gene families, which are recognizable by synteny conservation, and retrieving complete sets of species-level orthologs. We studied the implications of adopting homology, orthology, or synteny conservation as formal criteria for gene clustering by performing comparative analyses of 125 prokaryotic pangenomes. RESULTS: Clustering criteria affect pangenome functional characterization, core genome inference, and reconstruction of ancestral gene content to different extents. Species-wise estimates of pangenome and core genome sizes change by the same factor when using different clustering criteria, allowing robust cross-species comparisons regardless of the clustering criterion. However, cross-species comparisons of genome plasticity and functional profiles are substantially affected by inconsistencies among clustering criteria. Such inconsistencies are driven not only by mobile genetic elements, but also by genes involved in defense, secondary metabolism, and other accessory functions. In some pangenome features, the variability attributed to methodological inconsistencies can even exceed the effect sizes of ecological and phylogenetic variables. CONCLUSIONS: Choosing an appropriate criterion for gene clustering is critical to conduct unbiased pangenome analyses. We provide practical guidelines to choose the right method depending on the research goals and the quality of genome assemblies, and a benchmarking dataset to assess the robustness and reproducibility of future comparative studies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-023-03089-3.
format Online
Article
Text
id pubmed-10614367
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-106143672023-10-31 Comparison of gene clustering criteria reveals intrinsic uncertainty in pangenome analyses Manzano-Morales, Saioa Liu, Yang González-Bodí, Sara Huerta-Cepas, Jaime Iranzo, Jaime Genome Biol Research BACKGROUND: A key step for comparative genomics is to group open reading frames into functionally and evolutionarily meaningful gene clusters. Gene clustering is complicated by intraspecific duplications and horizontal gene transfers that are frequent in prokaryotes. In consequence, gene clustering methods must deal with a trade-off between identifying vertically transmitted representatives of multicopy gene families, which are recognizable by synteny conservation, and retrieving complete sets of species-level orthologs. We studied the implications of adopting homology, orthology, or synteny conservation as formal criteria for gene clustering by performing comparative analyses of 125 prokaryotic pangenomes. RESULTS: Clustering criteria affect pangenome functional characterization, core genome inference, and reconstruction of ancestral gene content to different extents. Species-wise estimates of pangenome and core genome sizes change by the same factor when using different clustering criteria, allowing robust cross-species comparisons regardless of the clustering criterion. However, cross-species comparisons of genome plasticity and functional profiles are substantially affected by inconsistencies among clustering criteria. Such inconsistencies are driven not only by mobile genetic elements, but also by genes involved in defense, secondary metabolism, and other accessory functions. In some pangenome features, the variability attributed to methodological inconsistencies can even exceed the effect sizes of ecological and phylogenetic variables. CONCLUSIONS: Choosing an appropriate criterion for gene clustering is critical to conduct unbiased pangenome analyses. We provide practical guidelines to choose the right method depending on the research goals and the quality of genome assemblies, and a benchmarking dataset to assess the robustness and reproducibility of future comparative studies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-023-03089-3. BioMed Central 2023-10-30 /pmc/articles/PMC10614367/ /pubmed/37904249 http://dx.doi.org/10.1186/s13059-023-03089-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Manzano-Morales, Saioa
Liu, Yang
González-Bodí, Sara
Huerta-Cepas, Jaime
Iranzo, Jaime
Comparison of gene clustering criteria reveals intrinsic uncertainty in pangenome analyses
title Comparison of gene clustering criteria reveals intrinsic uncertainty in pangenome analyses
title_full Comparison of gene clustering criteria reveals intrinsic uncertainty in pangenome analyses
title_fullStr Comparison of gene clustering criteria reveals intrinsic uncertainty in pangenome analyses
title_full_unstemmed Comparison of gene clustering criteria reveals intrinsic uncertainty in pangenome analyses
title_short Comparison of gene clustering criteria reveals intrinsic uncertainty in pangenome analyses
title_sort comparison of gene clustering criteria reveals intrinsic uncertainty in pangenome analyses
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10614367/
https://www.ncbi.nlm.nih.gov/pubmed/37904249
http://dx.doi.org/10.1186/s13059-023-03089-3
work_keys_str_mv AT manzanomoralessaioa comparisonofgeneclusteringcriteriarevealsintrinsicuncertaintyinpangenomeanalyses
AT liuyang comparisonofgeneclusteringcriteriarevealsintrinsicuncertaintyinpangenomeanalyses
AT gonzalezbodisara comparisonofgeneclusteringcriteriarevealsintrinsicuncertaintyinpangenomeanalyses
AT huertacepasjaime comparisonofgeneclusteringcriteriarevealsintrinsicuncertaintyinpangenomeanalyses
AT iranzojaime comparisonofgeneclusteringcriteriarevealsintrinsicuncertaintyinpangenomeanalyses