Cargando…

Analysis of lineage-specific protein family variability in prokaryotes combined with evolutionary reconstructions

BACKGROUND: Evolutionary rate is a key characteristic of gene families that is linked to the functional importance of the respective genes as well as specific biological functions of the proteins they encode. Accurate estimation of evolutionary rates is a challenging task that requires precise phylo...

Descripción completa

Detalles Bibliográficos
Autores principales: Karamycheva, Svetlana, Wolf, Yuri I., Persi, Erez, Koonin, Eugene V., Makarova, Kira S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9425974/
https://www.ncbi.nlm.nih.gov/pubmed/36042479
http://dx.doi.org/10.1186/s13062-022-00337-7
_version_ 1784778578650464256
author Karamycheva, Svetlana
Wolf, Yuri I.
Persi, Erez
Koonin, Eugene V.
Makarova, Kira S.
author_facet Karamycheva, Svetlana
Wolf, Yuri I.
Persi, Erez
Koonin, Eugene V.
Makarova, Kira S.
author_sort Karamycheva, Svetlana
collection PubMed
description BACKGROUND: Evolutionary rate is a key characteristic of gene families that is linked to the functional importance of the respective genes as well as specific biological functions of the proteins they encode. Accurate estimation of evolutionary rates is a challenging task that requires precise phylogenetic analysis. Here we present an easy to estimate protein family level measure of sequence variability based on alignment column homogeneity in multiple alignments of protein sequences from Clade-Specific Clusters of Orthologous Genes (csCOGs). RESULTS: We report genome-wide estimates of variability for 8 diverse groups of bacteria and archaea and investigate the connection between variability and various genomic and biological features. The variability estimates are based on homogeneity distributions across amino acid sequence alignments and can be obtained for multiple groups of genomes at minimal computational expense. About half of the variance in variability values can be explained by the analyzed features, with the greatest contribution coming from the extent of gene paralogy in the given csCOG. The correlation between variability and paralogy appears to originate, primarily, not from gene duplication, but from acquisition of distant paralogs and xenologs, introducing sequence variants that are more divergent than those that could have evolved in situ during the lifetime of the given group of organisms. Both high-variability and low-variability csCOGs were identified in all functional categories, but as expected, proteins encoded by integrated mobile elements as well as proteins involved in defense functions and cell motility are, on average, more variable than proteins with housekeeping functions. Additionally, using linear discriminant analysis, we found that variability and fraction of genomes carrying a given gene are the two variables that provide the best prediction of gene essentiality as compared to the results of transposon mutagenesis in Sulfolobus islandicus. CONCLUSIONS: Variability, a measure of sequence diversity within an alignment relative to the overall diversity within a group of organisms, offers a convenient proxy for evolutionary rate estimates and is informative with respect to prediction of functional properties of proteins. In particular, variability is a strong predictor of gene essentiality for the respective organisms and indicative of sub- or neofunctionalization of paralogs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13062-022-00337-7.
format Online
Article
Text
id pubmed-9425974
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-94259742022-08-31 Analysis of lineage-specific protein family variability in prokaryotes combined with evolutionary reconstructions Karamycheva, Svetlana Wolf, Yuri I. Persi, Erez Koonin, Eugene V. Makarova, Kira S. Biol Direct Research BACKGROUND: Evolutionary rate is a key characteristic of gene families that is linked to the functional importance of the respective genes as well as specific biological functions of the proteins they encode. Accurate estimation of evolutionary rates is a challenging task that requires precise phylogenetic analysis. Here we present an easy to estimate protein family level measure of sequence variability based on alignment column homogeneity in multiple alignments of protein sequences from Clade-Specific Clusters of Orthologous Genes (csCOGs). RESULTS: We report genome-wide estimates of variability for 8 diverse groups of bacteria and archaea and investigate the connection between variability and various genomic and biological features. The variability estimates are based on homogeneity distributions across amino acid sequence alignments and can be obtained for multiple groups of genomes at minimal computational expense. About half of the variance in variability values can be explained by the analyzed features, with the greatest contribution coming from the extent of gene paralogy in the given csCOG. The correlation between variability and paralogy appears to originate, primarily, not from gene duplication, but from acquisition of distant paralogs and xenologs, introducing sequence variants that are more divergent than those that could have evolved in situ during the lifetime of the given group of organisms. Both high-variability and low-variability csCOGs were identified in all functional categories, but as expected, proteins encoded by integrated mobile elements as well as proteins involved in defense functions and cell motility are, on average, more variable than proteins with housekeeping functions. Additionally, using linear discriminant analysis, we found that variability and fraction of genomes carrying a given gene are the two variables that provide the best prediction of gene essentiality as compared to the results of transposon mutagenesis in Sulfolobus islandicus. CONCLUSIONS: Variability, a measure of sequence diversity within an alignment relative to the overall diversity within a group of organisms, offers a convenient proxy for evolutionary rate estimates and is informative with respect to prediction of functional properties of proteins. In particular, variability is a strong predictor of gene essentiality for the respective organisms and indicative of sub- or neofunctionalization of paralogs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13062-022-00337-7. BioMed Central 2022-08-30 /pmc/articles/PMC9425974/ /pubmed/36042479 http://dx.doi.org/10.1186/s13062-022-00337-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Karamycheva, Svetlana
Wolf, Yuri I.
Persi, Erez
Koonin, Eugene V.
Makarova, Kira S.
Analysis of lineage-specific protein family variability in prokaryotes combined with evolutionary reconstructions
title Analysis of lineage-specific protein family variability in prokaryotes combined with evolutionary reconstructions
title_full Analysis of lineage-specific protein family variability in prokaryotes combined with evolutionary reconstructions
title_fullStr Analysis of lineage-specific protein family variability in prokaryotes combined with evolutionary reconstructions
title_full_unstemmed Analysis of lineage-specific protein family variability in prokaryotes combined with evolutionary reconstructions
title_short Analysis of lineage-specific protein family variability in prokaryotes combined with evolutionary reconstructions
title_sort analysis of lineage-specific protein family variability in prokaryotes combined with evolutionary reconstructions
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9425974/
https://www.ncbi.nlm.nih.gov/pubmed/36042479
http://dx.doi.org/10.1186/s13062-022-00337-7
work_keys_str_mv AT karamychevasvetlana analysisoflineagespecificproteinfamilyvariabilityinprokaryotescombinedwithevolutionaryreconstructions
AT wolfyurii analysisoflineagespecificproteinfamilyvariabilityinprokaryotescombinedwithevolutionaryreconstructions
AT persierez analysisoflineagespecificproteinfamilyvariabilityinprokaryotescombinedwithevolutionaryreconstructions
AT koonineugenev analysisoflineagespecificproteinfamilyvariabilityinprokaryotescombinedwithevolutionaryreconstructions
AT makarovakiras analysisoflineagespecificproteinfamilyvariabilityinprokaryotescombinedwithevolutionaryreconstructions