Cargando…

The role of data imbalance bias in the prediction of protein stability change upon mutation

There is a controversy over what causes the low robustness of some programs for predicting protein stability change upon mutation. Some researchers suggested that low-quality data and insufficiently informative features are the primary reasons, while others attributed the problem largely to a bias c...

Descripción completa

Detalles Bibliográficos
Autor principal: Fang, Jianwen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10062539/
https://www.ncbi.nlm.nih.gov/pubmed/36996153
http://dx.doi.org/10.1371/journal.pone.0283727
_version_ 1785017514956161024
author Fang, Jianwen
author_facet Fang, Jianwen
author_sort Fang, Jianwen
collection PubMed
description There is a controversy over what causes the low robustness of some programs for predicting protein stability change upon mutation. Some researchers suggested that low-quality data and insufficiently informative features are the primary reasons, while others attributed the problem largely to a bias caused by data imbalance as there are more destabilizing mutations than stabilizing ones. In this study, a simple approach was developed to construct a balanced dataset that was then conjugated with a leave-one-protein-out approach to illustrate that the bias may not be the primary reason for poor performance. A balanced dataset with some seemly good conventional n-fold CV results should not be used as a proof that a model for predicting protein stability change upon mutations is robust. Thus, some of the existing algorithms need to be re-examined before any practical applications. Also, more emphasis should be put on obtaining high quality and quantity of data and features in future research.
format Online
Article
Text
id pubmed-10062539
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-100625392023-03-31 The role of data imbalance bias in the prediction of protein stability change upon mutation Fang, Jianwen PLoS One Research Article There is a controversy over what causes the low robustness of some programs for predicting protein stability change upon mutation. Some researchers suggested that low-quality data and insufficiently informative features are the primary reasons, while others attributed the problem largely to a bias caused by data imbalance as there are more destabilizing mutations than stabilizing ones. In this study, a simple approach was developed to construct a balanced dataset that was then conjugated with a leave-one-protein-out approach to illustrate that the bias may not be the primary reason for poor performance. A balanced dataset with some seemly good conventional n-fold CV results should not be used as a proof that a model for predicting protein stability change upon mutations is robust. Thus, some of the existing algorithms need to be re-examined before any practical applications. Also, more emphasis should be put on obtaining high quality and quantity of data and features in future research. Public Library of Science 2023-03-30 /pmc/articles/PMC10062539/ /pubmed/36996153 http://dx.doi.org/10.1371/journal.pone.0283727 Text en https://creativecommons.org/publicdomain/zero/1.0/This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Fang, Jianwen
The role of data imbalance bias in the prediction of protein stability change upon mutation
title The role of data imbalance bias in the prediction of protein stability change upon mutation
title_full The role of data imbalance bias in the prediction of protein stability change upon mutation
title_fullStr The role of data imbalance bias in the prediction of protein stability change upon mutation
title_full_unstemmed The role of data imbalance bias in the prediction of protein stability change upon mutation
title_short The role of data imbalance bias in the prediction of protein stability change upon mutation
title_sort role of data imbalance bias in the prediction of protein stability change upon mutation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10062539/
https://www.ncbi.nlm.nih.gov/pubmed/36996153
http://dx.doi.org/10.1371/journal.pone.0283727
work_keys_str_mv AT fangjianwen theroleofdataimbalancebiasinthepredictionofproteinstabilitychangeuponmutation
AT fangjianwen roleofdataimbalancebiasinthepredictionofproteinstabilitychangeuponmutation