Cargando…

PON-Sol2: Prediction of Effects of Variants on Protein Solubility

Genetic variations have a multitude of effects on proteins. A substantial number of variations affect protein–solvent interactions, either aggregation or solubility. Aggregation is often related to structural alterations, whereas solubilizable proteins in the solid phase can be made again soluble by...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Yang, Zeng, Lianjie, Vihinen, Mauno
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8348231/
https://www.ncbi.nlm.nih.gov/pubmed/34360790
http://dx.doi.org/10.3390/ijms22158027
_version_ 1783735290448314368
author Yang, Yang
Zeng, Lianjie
Vihinen, Mauno
author_facet Yang, Yang
Zeng, Lianjie
Vihinen, Mauno
author_sort Yang, Yang
collection PubMed
description Genetic variations have a multitude of effects on proteins. A substantial number of variations affect protein–solvent interactions, either aggregation or solubility. Aggregation is often related to structural alterations, whereas solubilizable proteins in the solid phase can be made again soluble by dilution. Solubility is a central protein property and when reduced can lead to diseases. We developed a prediction method, PON-Sol2, to identify amino acid substitutions that increase, decrease, or have no effect on the protein solubility. The method is a machine learning tool utilizing gradient boosting algorithm and was trained on a large dataset of variants with different outcomes after the selection of features among a large number of tested properties. The method is fast and has high performance. The normalized correct prediction rate for three states is 0.656, and the normalized GC2 score is 0.312 in 10-fold cross-validation. The corresponding numbers in the blind test were 0.545 and 0.157. The performance was superior in comparison to previous methods. The PON-Sol2 predictor is freely available. It can be used to predict the solubility effects of variants for any organism, even in large-scale projects.
format Online
Article
Text
id pubmed-8348231
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-83482312021-08-08 PON-Sol2: Prediction of Effects of Variants on Protein Solubility Yang, Yang Zeng, Lianjie Vihinen, Mauno Int J Mol Sci Article Genetic variations have a multitude of effects on proteins. A substantial number of variations affect protein–solvent interactions, either aggregation or solubility. Aggregation is often related to structural alterations, whereas solubilizable proteins in the solid phase can be made again soluble by dilution. Solubility is a central protein property and when reduced can lead to diseases. We developed a prediction method, PON-Sol2, to identify amino acid substitutions that increase, decrease, or have no effect on the protein solubility. The method is a machine learning tool utilizing gradient boosting algorithm and was trained on a large dataset of variants with different outcomes after the selection of features among a large number of tested properties. The method is fast and has high performance. The normalized correct prediction rate for three states is 0.656, and the normalized GC2 score is 0.312 in 10-fold cross-validation. The corresponding numbers in the blind test were 0.545 and 0.157. The performance was superior in comparison to previous methods. The PON-Sol2 predictor is freely available. It can be used to predict the solubility effects of variants for any organism, even in large-scale projects. MDPI 2021-07-27 /pmc/articles/PMC8348231/ /pubmed/34360790 http://dx.doi.org/10.3390/ijms22158027 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Yang, Yang
Zeng, Lianjie
Vihinen, Mauno
PON-Sol2: Prediction of Effects of Variants on Protein Solubility
title PON-Sol2: Prediction of Effects of Variants on Protein Solubility
title_full PON-Sol2: Prediction of Effects of Variants on Protein Solubility
title_fullStr PON-Sol2: Prediction of Effects of Variants on Protein Solubility
title_full_unstemmed PON-Sol2: Prediction of Effects of Variants on Protein Solubility
title_short PON-Sol2: Prediction of Effects of Variants on Protein Solubility
title_sort pon-sol2: prediction of effects of variants on protein solubility
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8348231/
https://www.ncbi.nlm.nih.gov/pubmed/34360790
http://dx.doi.org/10.3390/ijms22158027
work_keys_str_mv AT yangyang ponsol2predictionofeffectsofvariantsonproteinsolubility
AT zenglianjie ponsol2predictionofeffectsofvariantsonproteinsolubility
AT vihinenmauno ponsol2predictionofeffectsofvariantsonproteinsolubility