Cargando…

Challenges in predicting stabilizing variations: An exploration

An open challenge of computational and experimental biology is understanding the impact of non-synonymous DNA variations on protein function and, subsequently, human health. The effects of these variants on protein stability can be measured as the difference in the free energy of unfolding (ΔΔG) bet...

Descripción completa

Detalles Bibliográficos
Autores principales:	Benevenuta, Silvia, Birolo, Giovanni, Sanavia, Tiziana, Capriotti, Emidio, Fariselli, Piero
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Molecular Biosciences
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9849384/ https://www.ncbi.nlm.nih.gov/pubmed/36685278 http://dx.doi.org/10.3389/fmolb.2022.1075570

_version_	1784871949734772736
author	Benevenuta, Silvia Birolo, Giovanni Sanavia, Tiziana Capriotti, Emidio Fariselli, Piero
author_facet	Benevenuta, Silvia Birolo, Giovanni Sanavia, Tiziana Capriotti, Emidio Fariselli, Piero
author_sort	Benevenuta, Silvia
collection	PubMed
description	An open challenge of computational and experimental biology is understanding the impact of non-synonymous DNA variations on protein function and, subsequently, human health. The effects of these variants on protein stability can be measured as the difference in the free energy of unfolding (ΔΔG) between the mutated structure of the protein and its wild-type form. Throughout the years, bioinformaticians have developed a wide variety of tools and approaches to predict the ΔΔG. Although the performance of these tools is highly variable, overall they are less accurate in predicting ΔΔG stabilizing variations rather than the destabilizing ones. Here, we analyze the possible reasons for this difference by focusing on the relationship between experimentally-measured ΔΔG and seven protein properties on three widely-used datasets (S2648, VariBench, Ssym) and a recently introduced one (S669). These properties include protein structural information, different physical properties and statistical potentials. We found that two highly used input features, i.e., hydrophobicity and the Blosum62 substitution matrix, show a performance close to random choice when trying to separate stabilizing variants from either neutral or destabilizing ones. We then speculate that, since destabilizing variations are the most abundant class in the available datasets, the overall performance of the methods is higher when including features that improve the prediction for the destabilizing variants at the expense of the stabilizing ones. These findings highlight the need of designing predictive methods able to exploit also input features highly correlated with the stabilizing variants. New tools should also be tested on a not-artificially balanced dataset, reporting the performance on all the three classes (i.e., stabilizing, neutral and destabilizing variants) and not only the overall results.
format	Online Article Text
id	pubmed-9849384
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-98493842023-01-20 Challenges in predicting stabilizing variations: An exploration Benevenuta, Silvia Birolo, Giovanni Sanavia, Tiziana Capriotti, Emidio Fariselli, Piero Front Mol Biosci Molecular Biosciences An open challenge of computational and experimental biology is understanding the impact of non-synonymous DNA variations on protein function and, subsequently, human health. The effects of these variants on protein stability can be measured as the difference in the free energy of unfolding (ΔΔG) between the mutated structure of the protein and its wild-type form. Throughout the years, bioinformaticians have developed a wide variety of tools and approaches to predict the ΔΔG. Although the performance of these tools is highly variable, overall they are less accurate in predicting ΔΔG stabilizing variations rather than the destabilizing ones. Here, we analyze the possible reasons for this difference by focusing on the relationship between experimentally-measured ΔΔG and seven protein properties on three widely-used datasets (S2648, VariBench, Ssym) and a recently introduced one (S669). These properties include protein structural information, different physical properties and statistical potentials. We found that two highly used input features, i.e., hydrophobicity and the Blosum62 substitution matrix, show a performance close to random choice when trying to separate stabilizing variants from either neutral or destabilizing ones. We then speculate that, since destabilizing variations are the most abundant class in the available datasets, the overall performance of the methods is higher when including features that improve the prediction for the destabilizing variants at the expense of the stabilizing ones. These findings highlight the need of designing predictive methods able to exploit also input features highly correlated with the stabilizing variants. New tools should also be tested on a not-artificially balanced dataset, reporting the performance on all the three classes (i.e., stabilizing, neutral and destabilizing variants) and not only the overall results. Frontiers Media S.A. 2023-01-05 /pmc/articles/PMC9849384/ /pubmed/36685278 http://dx.doi.org/10.3389/fmolb.2022.1075570 Text en Copyright © 2023 Benevenuta, Birolo, Sanavia, Capriotti and Fariselli. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Molecular Biosciences Benevenuta, Silvia Birolo, Giovanni Sanavia, Tiziana Capriotti, Emidio Fariselli, Piero Challenges in predicting stabilizing variations: An exploration
title	Challenges in predicting stabilizing variations: An exploration
title_full	Challenges in predicting stabilizing variations: An exploration
title_fullStr	Challenges in predicting stabilizing variations: An exploration
title_full_unstemmed	Challenges in predicting stabilizing variations: An exploration
title_short	Challenges in predicting stabilizing variations: An exploration
title_sort	challenges in predicting stabilizing variations: an exploration
topic	Molecular Biosciences
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9849384/ https://www.ncbi.nlm.nih.gov/pubmed/36685278 http://dx.doi.org/10.3389/fmolb.2022.1075570
work_keys_str_mv	AT benevenutasilvia challengesinpredictingstabilizingvariationsanexploration AT birologiovanni challengesinpredictingstabilizingvariationsanexploration AT sanaviatiziana challengesinpredictingstabilizingvariationsanexploration AT capriottiemidio challengesinpredictingstabilizingvariationsanexploration AT farisellipiero challengesinpredictingstabilizingvariationsanexploration

Challenges in predicting stabilizing variations: An exploration

Ejemplares similares