Cargando…
Bridging the gaps in statistical models of protein alignment
SUMMARY: Sequences of proteins evolve by accumulating substitutions together with insertions and deletions (indels) of amino acids. However, it remains a common practice to disconnect substitutions and indels, and infer approximate models for each of them separately, to quantify sequence relationshi...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235498/ https://www.ncbi.nlm.nih.gov/pubmed/35758809 http://dx.doi.org/10.1093/bioinformatics/btac246 |
_version_ | 1784736324755914752 |
---|---|
author | Sumanaweera, Dinithi Allison, Lloyd Konagurthu, Arun S |
author_facet | Sumanaweera, Dinithi Allison, Lloyd Konagurthu, Arun S |
author_sort | Sumanaweera, Dinithi |
collection | PubMed |
description | SUMMARY: Sequences of proteins evolve by accumulating substitutions together with insertions and deletions (indels) of amino acids. However, it remains a common practice to disconnect substitutions and indels, and infer approximate models for each of them separately, to quantify sequence relationships. Although this approach brings with it computational convenience (which remains its primary motivation), there is a dearth of attempts to unify and model them systematically and together. To overcome this gap, this article demonstrates how a complete statistical model quantifying the evolution of pairs of aligned proteins can be constructed using a time-parameterized substitution matrix and a time-parameterized alignment state machine. Methods to derive all parameters of such a model from any benchmark collection of aligned protein sequences are described here. This has not only allowed us to generate a unified statistical model for each of the nine widely used substitution matrices (PAM, JTT, BLOSUM, JO, WAG, VTML, LG, MIQS and PFASUM), but also resulted in a new unified model, MMLSUM. Our underlying methodology measures the Shannon information content using each model to explain losslessly any given collection of alignments, which has allowed us to quantify the performance of all the above models on six comprehensive alignment benchmarks. Our results show that MMLSUM results in a new and clear overall best performance, followed by PFASUM, VTML, BLOSUM and MIQS, respectively, amongst the top five. We further analyze the statistical properties of MMLSUM model and contrast it with others. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-9235498 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-92354982022-06-29 Bridging the gaps in statistical models of protein alignment Sumanaweera, Dinithi Allison, Lloyd Konagurthu, Arun S Bioinformatics ISCB/Ismb 2022 SUMMARY: Sequences of proteins evolve by accumulating substitutions together with insertions and deletions (indels) of amino acids. However, it remains a common practice to disconnect substitutions and indels, and infer approximate models for each of them separately, to quantify sequence relationships. Although this approach brings with it computational convenience (which remains its primary motivation), there is a dearth of attempts to unify and model them systematically and together. To overcome this gap, this article demonstrates how a complete statistical model quantifying the evolution of pairs of aligned proteins can be constructed using a time-parameterized substitution matrix and a time-parameterized alignment state machine. Methods to derive all parameters of such a model from any benchmark collection of aligned protein sequences are described here. This has not only allowed us to generate a unified statistical model for each of the nine widely used substitution matrices (PAM, JTT, BLOSUM, JO, WAG, VTML, LG, MIQS and PFASUM), but also resulted in a new unified model, MMLSUM. Our underlying methodology measures the Shannon information content using each model to explain losslessly any given collection of alignments, which has allowed us to quantify the performance of all the above models on six comprehensive alignment benchmarks. Our results show that MMLSUM results in a new and clear overall best performance, followed by PFASUM, VTML, BLOSUM and MIQS, respectively, amongst the top five. We further analyze the statistical properties of MMLSUM model and contrast it with others. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-06-27 /pmc/articles/PMC9235498/ /pubmed/35758809 http://dx.doi.org/10.1093/bioinformatics/btac246 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | ISCB/Ismb 2022 Sumanaweera, Dinithi Allison, Lloyd Konagurthu, Arun S Bridging the gaps in statistical models of protein alignment |
title | Bridging the gaps in statistical models of protein alignment |
title_full | Bridging the gaps in statistical models of protein alignment |
title_fullStr | Bridging the gaps in statistical models of protein alignment |
title_full_unstemmed | Bridging the gaps in statistical models of protein alignment |
title_short | Bridging the gaps in statistical models of protein alignment |
title_sort | bridging the gaps in statistical models of protein alignment |
topic | ISCB/Ismb 2022 |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235498/ https://www.ncbi.nlm.nih.gov/pubmed/35758809 http://dx.doi.org/10.1093/bioinformatics/btac246 |
work_keys_str_mv | AT sumanaweeradinithi bridgingthegapsinstatisticalmodelsofproteinalignment AT allisonlloyd bridgingthegapsinstatisticalmodelsofproteinalignment AT konagurthuaruns bridgingthegapsinstatisticalmodelsofproteinalignment |