Cargando…

PC_ali: a tool for improved multiple alignments and evolutionary inference based on a hybrid protein sequence and structure similarity score

MOTIVATION: Evolutionary inference depends crucially on the quality of multiple sequence alignments (MSA), which is problematic for distantly related proteins. Since protein structure is more conserved than sequence, it seems natural to use structure alignments for distant homologs. However, structu...

Descripción completa

Detalles Bibliográficos
Autores principales: Bastolla, Ugo, Abia, David, Piette, Oscar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10628387/
https://www.ncbi.nlm.nih.gov/pubmed/37847775
http://dx.doi.org/10.1093/bioinformatics/btad630
_version_ 1785131745613447168
author Bastolla, Ugo
Abia, David
Piette, Oscar
author_facet Bastolla, Ugo
Abia, David
Piette, Oscar
author_sort Bastolla, Ugo
collection PubMed
description MOTIVATION: Evolutionary inference depends crucially on the quality of multiple sequence alignments (MSA), which is problematic for distantly related proteins. Since protein structure is more conserved than sequence, it seems natural to use structure alignments for distant homologs. However, structure alignments may not be suitable for inferring evolutionary relationships. RESULTS: Here we examined four protein similarity measures that depend on sequence and structure (fraction of aligned residues, sequence identity, fraction of superimposed residues, and contact overlap), finding that they are intimately correlated but none of them provides a complete and unbiased picture of conservation in proteins. Therefore, we propose the new hybrid protein sequence and structure similarity score PC_sim based on their main principal component. The corresponding divergence measure PC_div shows the strongest correlation with divergences obtained from individual similarities, suggesting that it infers accurate evolutionary divergences. We developed the program PC_ali that constructs protein MSAs either de novo or modifying an input MSA, using a similarity matrix based on PC_sim. The program constructs a starting MSA based on the maximal cliques of the graph of these PAs and it refines it through progressive alignments along the tree reconstructed with PC_div. Compared with eight state-of-the-art multiple structure or sequence alignment tools, PC_ali achieves higher or equal aligned fraction and structural scores, sequence identity higher than structure aligners although lower than sequence aligners, highest score PC_sim, and highest similarity with the MSAs produced by other tools and with the reference MSA Balibase. AVAILABILITY AND IMPLEMENTATION: https://github.com/ugobas/PC_ali.
format Online
Article
Text
id pubmed-10628387
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-106283872023-11-08 PC_ali: a tool for improved multiple alignments and evolutionary inference based on a hybrid protein sequence and structure similarity score Bastolla, Ugo Abia, David Piette, Oscar Bioinformatics Original Paper MOTIVATION: Evolutionary inference depends crucially on the quality of multiple sequence alignments (MSA), which is problematic for distantly related proteins. Since protein structure is more conserved than sequence, it seems natural to use structure alignments for distant homologs. However, structure alignments may not be suitable for inferring evolutionary relationships. RESULTS: Here we examined four protein similarity measures that depend on sequence and structure (fraction of aligned residues, sequence identity, fraction of superimposed residues, and contact overlap), finding that they are intimately correlated but none of them provides a complete and unbiased picture of conservation in proteins. Therefore, we propose the new hybrid protein sequence and structure similarity score PC_sim based on their main principal component. The corresponding divergence measure PC_div shows the strongest correlation with divergences obtained from individual similarities, suggesting that it infers accurate evolutionary divergences. We developed the program PC_ali that constructs protein MSAs either de novo or modifying an input MSA, using a similarity matrix based on PC_sim. The program constructs a starting MSA based on the maximal cliques of the graph of these PAs and it refines it through progressive alignments along the tree reconstructed with PC_div. Compared with eight state-of-the-art multiple structure or sequence alignment tools, PC_ali achieves higher or equal aligned fraction and structural scores, sequence identity higher than structure aligners although lower than sequence aligners, highest score PC_sim, and highest similarity with the MSAs produced by other tools and with the reference MSA Balibase. AVAILABILITY AND IMPLEMENTATION: https://github.com/ugobas/PC_ali. Oxford University Press 2023-10-17 /pmc/articles/PMC10628387/ /pubmed/37847775 http://dx.doi.org/10.1093/bioinformatics/btad630 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Bastolla, Ugo
Abia, David
Piette, Oscar
PC_ali: a tool for improved multiple alignments and evolutionary inference based on a hybrid protein sequence and structure similarity score
title PC_ali: a tool for improved multiple alignments and evolutionary inference based on a hybrid protein sequence and structure similarity score
title_full PC_ali: a tool for improved multiple alignments and evolutionary inference based on a hybrid protein sequence and structure similarity score
title_fullStr PC_ali: a tool for improved multiple alignments and evolutionary inference based on a hybrid protein sequence and structure similarity score
title_full_unstemmed PC_ali: a tool for improved multiple alignments and evolutionary inference based on a hybrid protein sequence and structure similarity score
title_short PC_ali: a tool for improved multiple alignments and evolutionary inference based on a hybrid protein sequence and structure similarity score
title_sort pc_ali: a tool for improved multiple alignments and evolutionary inference based on a hybrid protein sequence and structure similarity score
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10628387/
https://www.ncbi.nlm.nih.gov/pubmed/37847775
http://dx.doi.org/10.1093/bioinformatics/btad630
work_keys_str_mv AT bastollaugo pcaliatoolforimprovedmultiplealignmentsandevolutionaryinferencebasedonahybridproteinsequenceandstructuresimilarityscore
AT abiadavid pcaliatoolforimprovedmultiplealignmentsandevolutionaryinferencebasedonahybridproteinsequenceandstructuresimilarityscore
AT pietteoscar pcaliatoolforimprovedmultiplealignmentsandevolutionaryinferencebasedonahybridproteinsequenceandstructuresimilarityscore