Cargando…
The role of data imbalance bias in the prediction of protein stability change upon mutation
There is a controversy over what causes the low robustness of some programs for predicting protein stability change upon mutation. Some researchers suggested that low-quality data and insufficiently informative features are the primary reasons, while others attributed the problem largely to a bias c...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10062539/ https://www.ncbi.nlm.nih.gov/pubmed/36996153 http://dx.doi.org/10.1371/journal.pone.0283727 |
_version_ | 1785017514956161024 |
---|---|
author | Fang, Jianwen |
author_facet | Fang, Jianwen |
author_sort | Fang, Jianwen |
collection | PubMed |
description | There is a controversy over what causes the low robustness of some programs for predicting protein stability change upon mutation. Some researchers suggested that low-quality data and insufficiently informative features are the primary reasons, while others attributed the problem largely to a bias caused by data imbalance as there are more destabilizing mutations than stabilizing ones. In this study, a simple approach was developed to construct a balanced dataset that was then conjugated with a leave-one-protein-out approach to illustrate that the bias may not be the primary reason for poor performance. A balanced dataset with some seemly good conventional n-fold CV results should not be used as a proof that a model for predicting protein stability change upon mutations is robust. Thus, some of the existing algorithms need to be re-examined before any practical applications. Also, more emphasis should be put on obtaining high quality and quantity of data and features in future research. |
format | Online Article Text |
id | pubmed-10062539 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-100625392023-03-31 The role of data imbalance bias in the prediction of protein stability change upon mutation Fang, Jianwen PLoS One Research Article There is a controversy over what causes the low robustness of some programs for predicting protein stability change upon mutation. Some researchers suggested that low-quality data and insufficiently informative features are the primary reasons, while others attributed the problem largely to a bias caused by data imbalance as there are more destabilizing mutations than stabilizing ones. In this study, a simple approach was developed to construct a balanced dataset that was then conjugated with a leave-one-protein-out approach to illustrate that the bias may not be the primary reason for poor performance. A balanced dataset with some seemly good conventional n-fold CV results should not be used as a proof that a model for predicting protein stability change upon mutations is robust. Thus, some of the existing algorithms need to be re-examined before any practical applications. Also, more emphasis should be put on obtaining high quality and quantity of data and features in future research. Public Library of Science 2023-03-30 /pmc/articles/PMC10062539/ /pubmed/36996153 http://dx.doi.org/10.1371/journal.pone.0283727 Text en https://creativecommons.org/publicdomain/zero/1.0/This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication. |
spellingShingle | Research Article Fang, Jianwen The role of data imbalance bias in the prediction of protein stability change upon mutation |
title | The role of data imbalance bias in the prediction of protein stability change upon mutation |
title_full | The role of data imbalance bias in the prediction of protein stability change upon mutation |
title_fullStr | The role of data imbalance bias in the prediction of protein stability change upon mutation |
title_full_unstemmed | The role of data imbalance bias in the prediction of protein stability change upon mutation |
title_short | The role of data imbalance bias in the prediction of protein stability change upon mutation |
title_sort | role of data imbalance bias in the prediction of protein stability change upon mutation |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10062539/ https://www.ncbi.nlm.nih.gov/pubmed/36996153 http://dx.doi.org/10.1371/journal.pone.0283727 |
work_keys_str_mv | AT fangjianwen theroleofdataimbalancebiasinthepredictionofproteinstabilitychangeuponmutation AT fangjianwen roleofdataimbalancebiasinthepredictionofproteinstabilitychangeuponmutation |