Cargando…

Machine Learning: How Much Does It Tell about Protein Folding Rates?

The prediction of protein folding rates is a necessary step towards understanding the principles of protein folding. Due to the increasing amount of experimental data, numerous protein folding models and predictors of protein folding rates have been developed in the last decade. The problem has also...

Descripción completa

Detalles Bibliográficos
Autores principales: Corrales, Marc, Cuscó, Pol, Usmanova, Dinara R., Chen, Heng-Chang, Bogatyreva, Natalya S., Filion, Guillaume J., Ivankov, Dmitry N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4659572/
https://www.ncbi.nlm.nih.gov/pubmed/26606303
http://dx.doi.org/10.1371/journal.pone.0143166
_version_ 1782402643065307136
author Corrales, Marc
Cuscó, Pol
Usmanova, Dinara R.
Chen, Heng-Chang
Bogatyreva, Natalya S.
Filion, Guillaume J.
Ivankov, Dmitry N.
author_facet Corrales, Marc
Cuscó, Pol
Usmanova, Dinara R.
Chen, Heng-Chang
Bogatyreva, Natalya S.
Filion, Guillaume J.
Ivankov, Dmitry N.
author_sort Corrales, Marc
collection PubMed
description The prediction of protein folding rates is a necessary step towards understanding the principles of protein folding. Due to the increasing amount of experimental data, numerous protein folding models and predictors of protein folding rates have been developed in the last decade. The problem has also attracted the attention of scientists from computational fields, which led to the publication of several machine learning-based models to predict the rate of protein folding. Some of them claim to predict the logarithm of protein folding rate with an accuracy greater than 90%. However, there are reasons to believe that such claims are exaggerated due to large fluctuations and overfitting of the estimates. When we confronted three selected published models with new data, we found a much lower predictive power than reported in the original publications. Overly optimistic predictive powers appear from violations of the basic principles of machine-learning. We highlight common misconceptions in the studies claiming excessive predictive power and propose to use learning curves as a safeguard against those mistakes. As an example, we show that the current amount of experimental data is insufficient to build a linear predictor of logarithms of folding rates based on protein amino acid composition.
format Online
Article
Text
id pubmed-4659572
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-46595722015-12-02 Machine Learning: How Much Does It Tell about Protein Folding Rates? Corrales, Marc Cuscó, Pol Usmanova, Dinara R. Chen, Heng-Chang Bogatyreva, Natalya S. Filion, Guillaume J. Ivankov, Dmitry N. PLoS One Research Article The prediction of protein folding rates is a necessary step towards understanding the principles of protein folding. Due to the increasing amount of experimental data, numerous protein folding models and predictors of protein folding rates have been developed in the last decade. The problem has also attracted the attention of scientists from computational fields, which led to the publication of several machine learning-based models to predict the rate of protein folding. Some of them claim to predict the logarithm of protein folding rate with an accuracy greater than 90%. However, there are reasons to believe that such claims are exaggerated due to large fluctuations and overfitting of the estimates. When we confronted three selected published models with new data, we found a much lower predictive power than reported in the original publications. Overly optimistic predictive powers appear from violations of the basic principles of machine-learning. We highlight common misconceptions in the studies claiming excessive predictive power and propose to use learning curves as a safeguard against those mistakes. As an example, we show that the current amount of experimental data is insufficient to build a linear predictor of logarithms of folding rates based on protein amino acid composition. Public Library of Science 2015-11-25 /pmc/articles/PMC4659572/ /pubmed/26606303 http://dx.doi.org/10.1371/journal.pone.0143166 Text en © 2015 Corrales et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Corrales, Marc
Cuscó, Pol
Usmanova, Dinara R.
Chen, Heng-Chang
Bogatyreva, Natalya S.
Filion, Guillaume J.
Ivankov, Dmitry N.
Machine Learning: How Much Does It Tell about Protein Folding Rates?
title Machine Learning: How Much Does It Tell about Protein Folding Rates?
title_full Machine Learning: How Much Does It Tell about Protein Folding Rates?
title_fullStr Machine Learning: How Much Does It Tell about Protein Folding Rates?
title_full_unstemmed Machine Learning: How Much Does It Tell about Protein Folding Rates?
title_short Machine Learning: How Much Does It Tell about Protein Folding Rates?
title_sort machine learning: how much does it tell about protein folding rates?
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4659572/
https://www.ncbi.nlm.nih.gov/pubmed/26606303
http://dx.doi.org/10.1371/journal.pone.0143166
work_keys_str_mv AT corralesmarc machinelearninghowmuchdoesittellaboutproteinfoldingrates
AT cuscopol machinelearninghowmuchdoesittellaboutproteinfoldingrates
AT usmanovadinarar machinelearninghowmuchdoesittellaboutproteinfoldingrates
AT chenhengchang machinelearninghowmuchdoesittellaboutproteinfoldingrates
AT bogatyrevanatalyas machinelearninghowmuchdoesittellaboutproteinfoldingrates
AT filionguillaumej machinelearninghowmuchdoesittellaboutproteinfoldingrates
AT ivankovdmitryn machinelearninghowmuchdoesittellaboutproteinfoldingrates