Cargando…

Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information

Protein structure prediction continues to pose multiple challenges despite outstanding progress that is largely attributable to the use of novel machine learning techniques. One of the widely used representations of local 3D structure—protein blocks (PBs)—can be treated in a similar way to secondary...

Descripción completa

Detalles Bibliográficos
Autores principales: Milchevskiy, Yury V., Milchevskaya, Vladislava Y., Nikitin, Alexei M., Kravatsky, Yury V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10648199/
https://www.ncbi.nlm.nih.gov/pubmed/37958639
http://dx.doi.org/10.3390/ijms242115656
_version_ 1785135284339343360
author Milchevskiy, Yury V.
Milchevskaya, Vladislava Y.
Nikitin, Alexei M.
Kravatsky, Yury V.
author_facet Milchevskiy, Yury V.
Milchevskaya, Vladislava Y.
Nikitin, Alexei M.
Kravatsky, Yury V.
author_sort Milchevskiy, Yury V.
collection PubMed
description Protein structure prediction continues to pose multiple challenges despite outstanding progress that is largely attributable to the use of novel machine learning techniques. One of the widely used representations of local 3D structure—protein blocks (PBs)—can be treated in a similar way to secondary structure classes. Here, we present a new approach for predicting local conformation in terms of PB classes solely from amino acid sequences. We apply the RMSD metric to ensure unambiguous future 3D protein structure recovery. The selection of statistically assessed features is a key component of the proposed method. We suggest that ML input features should be created from the statistically significant predictors that are derived from the amino acids’ physicochemical properties and the resolved structures’ statistics. The statistical significance of the suggested features was assessed using a stepwise regression analysis that permitted the evaluation of the contribution and statistical significance of each predictor. We used the set of 380 statistically significant predictors as a learning model for the regression neural network that was trained using the PISCES30 dataset. When using the same dataset and metrics for benchmarking, our method outperformed all other methods reported in the literature for the CB513 nonredundant dataset (for the PBs, Q16 = 81.01%, and for the DSSP, Q3 = 85.99% and Q8 = 79.35%).
format Online
Article
Text
id pubmed-10648199
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-106481992023-10-27 Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information Milchevskiy, Yury V. Milchevskaya, Vladislava Y. Nikitin, Alexei M. Kravatsky, Yury V. Int J Mol Sci Article Protein structure prediction continues to pose multiple challenges despite outstanding progress that is largely attributable to the use of novel machine learning techniques. One of the widely used representations of local 3D structure—protein blocks (PBs)—can be treated in a similar way to secondary structure classes. Here, we present a new approach for predicting local conformation in terms of PB classes solely from amino acid sequences. We apply the RMSD metric to ensure unambiguous future 3D protein structure recovery. The selection of statistically assessed features is a key component of the proposed method. We suggest that ML input features should be created from the statistically significant predictors that are derived from the amino acids’ physicochemical properties and the resolved structures’ statistics. The statistical significance of the suggested features was assessed using a stepwise regression analysis that permitted the evaluation of the contribution and statistical significance of each predictor. We used the set of 380 statistically significant predictors as a learning model for the regression neural network that was trained using the PISCES30 dataset. When using the same dataset and metrics for benchmarking, our method outperformed all other methods reported in the literature for the CB513 nonredundant dataset (for the PBs, Q16 = 81.01%, and for the DSSP, Q3 = 85.99% and Q8 = 79.35%). MDPI 2023-10-27 /pmc/articles/PMC10648199/ /pubmed/37958639 http://dx.doi.org/10.3390/ijms242115656 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Milchevskiy, Yury V.
Milchevskaya, Vladislava Y.
Nikitin, Alexei M.
Kravatsky, Yury V.
Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information
title Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information
title_full Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information
title_fullStr Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information
title_full_unstemmed Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information
title_short Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information
title_sort effective local and secondary protein structure prediction by combining a neural network-based approach with extensive feature design and selection without reliance on evolutionary information
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10648199/
https://www.ncbi.nlm.nih.gov/pubmed/37958639
http://dx.doi.org/10.3390/ijms242115656
work_keys_str_mv AT milchevskiyyuryv effectivelocalandsecondaryproteinstructurepredictionbycombininganeuralnetworkbasedapproachwithextensivefeaturedesignandselectionwithoutrelianceonevolutionaryinformation
AT milchevskayavladislavay effectivelocalandsecondaryproteinstructurepredictionbycombininganeuralnetworkbasedapproachwithextensivefeaturedesignandselectionwithoutrelianceonevolutionaryinformation
AT nikitinalexeim effectivelocalandsecondaryproteinstructurepredictionbycombininganeuralnetworkbasedapproachwithextensivefeaturedesignandselectionwithoutrelianceonevolutionaryinformation
AT kravatskyyuryv effectivelocalandsecondaryproteinstructurepredictionbycombininganeuralnetworkbasedapproachwithextensivefeaturedesignandselectionwithoutrelianceonevolutionaryinformation