Cargando…

Annotation-free delineation of prokaryotic homology groups

Phylogenomic studies of prokaryotic taxa often assume conserved marker genes are homologous across their length. However, processes such as horizontal gene transfer or gene duplication and loss may disrupt this homology by recombining only parts of genes, causing gene fission or fusion. We show usin...

Descripción completa

Detalles Bibliográficos
Autores principales: Yin, Yongze, Ogilvie, Huw A., Nakhleh, Luay
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9212150/
https://www.ncbi.nlm.nih.gov/pubmed/35675326
http://dx.doi.org/10.1371/journal.pcbi.1010216
_version_ 1784730515983564800
author Yin, Yongze
Ogilvie, Huw A.
Nakhleh, Luay
author_facet Yin, Yongze
Ogilvie, Huw A.
Nakhleh, Luay
author_sort Yin, Yongze
collection PubMed
description Phylogenomic studies of prokaryotic taxa often assume conserved marker genes are homologous across their length. However, processes such as horizontal gene transfer or gene duplication and loss may disrupt this homology by recombining only parts of genes, causing gene fission or fusion. We show using simulation that it is necessary to delineate homology groups in a set of bacterial genomes without relying on gene annotations to define the boundaries of homologous regions. To solve this problem, we have developed a graph-based algorithm to partition a set of bacterial genomes into Maximal Homologous Groups of sequences (MHGs) where each MHG is a maximal set of maximum-length sequences which are homologous across the entire sequence alignment. We applied our algorithm to a dataset of 19 Enterobacteriaceae species and found that MHGs cover much greater proportions of genomes than markers and, relatedly, are less biased in terms of the functions of the genes they cover. We zoomed in on the correlation between each individual marker and their overlapping MHGs, and show that few phylogenetic splits supported by the markers are supported by the MHGs while many marker-supported splits are contradicted by the MHGs. A comparison of the species tree inferred from marker genes with the species tree inferred from MHGs suggests that the increased bias and lack of genome coverage by markers causes incorrect inferences as to the overall relationship between bacterial taxa.
format Online
Article
Text
id pubmed-9212150
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-92121502022-06-22 Annotation-free delineation of prokaryotic homology groups Yin, Yongze Ogilvie, Huw A. Nakhleh, Luay PLoS Comput Biol Research Article Phylogenomic studies of prokaryotic taxa often assume conserved marker genes are homologous across their length. However, processes such as horizontal gene transfer or gene duplication and loss may disrupt this homology by recombining only parts of genes, causing gene fission or fusion. We show using simulation that it is necessary to delineate homology groups in a set of bacterial genomes without relying on gene annotations to define the boundaries of homologous regions. To solve this problem, we have developed a graph-based algorithm to partition a set of bacterial genomes into Maximal Homologous Groups of sequences (MHGs) where each MHG is a maximal set of maximum-length sequences which are homologous across the entire sequence alignment. We applied our algorithm to a dataset of 19 Enterobacteriaceae species and found that MHGs cover much greater proportions of genomes than markers and, relatedly, are less biased in terms of the functions of the genes they cover. We zoomed in on the correlation between each individual marker and their overlapping MHGs, and show that few phylogenetic splits supported by the markers are supported by the MHGs while many marker-supported splits are contradicted by the MHGs. A comparison of the species tree inferred from marker genes with the species tree inferred from MHGs suggests that the increased bias and lack of genome coverage by markers causes incorrect inferences as to the overall relationship between bacterial taxa. Public Library of Science 2022-06-08 /pmc/articles/PMC9212150/ /pubmed/35675326 http://dx.doi.org/10.1371/journal.pcbi.1010216 Text en © 2022 Yin et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Yin, Yongze
Ogilvie, Huw A.
Nakhleh, Luay
Annotation-free delineation of prokaryotic homology groups
title Annotation-free delineation of prokaryotic homology groups
title_full Annotation-free delineation of prokaryotic homology groups
title_fullStr Annotation-free delineation of prokaryotic homology groups
title_full_unstemmed Annotation-free delineation of prokaryotic homology groups
title_short Annotation-free delineation of prokaryotic homology groups
title_sort annotation-free delineation of prokaryotic homology groups
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9212150/
https://www.ncbi.nlm.nih.gov/pubmed/35675326
http://dx.doi.org/10.1371/journal.pcbi.1010216
work_keys_str_mv AT yinyongze annotationfreedelineationofprokaryotichomologygroups
AT ogilviehuwa annotationfreedelineationofprokaryotichomologygroups
AT nakhlehluay annotationfreedelineationofprokaryotichomologygroups