Cargando…
Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information
Protein structure prediction continues to pose multiple challenges despite outstanding progress that is largely attributable to the use of novel machine learning techniques. One of the widely used representations of local 3D structure—protein blocks (PBs)—can be treated in a similar way to secondary...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10648199/ https://www.ncbi.nlm.nih.gov/pubmed/37958639 http://dx.doi.org/10.3390/ijms242115656 |
_version_ | 1785135284339343360 |
---|---|
author | Milchevskiy, Yury V. Milchevskaya, Vladislava Y. Nikitin, Alexei M. Kravatsky, Yury V. |
author_facet | Milchevskiy, Yury V. Milchevskaya, Vladislava Y. Nikitin, Alexei M. Kravatsky, Yury V. |
author_sort | Milchevskiy, Yury V. |
collection | PubMed |
description | Protein structure prediction continues to pose multiple challenges despite outstanding progress that is largely attributable to the use of novel machine learning techniques. One of the widely used representations of local 3D structure—protein blocks (PBs)—can be treated in a similar way to secondary structure classes. Here, we present a new approach for predicting local conformation in terms of PB classes solely from amino acid sequences. We apply the RMSD metric to ensure unambiguous future 3D protein structure recovery. The selection of statistically assessed features is a key component of the proposed method. We suggest that ML input features should be created from the statistically significant predictors that are derived from the amino acids’ physicochemical properties and the resolved structures’ statistics. The statistical significance of the suggested features was assessed using a stepwise regression analysis that permitted the evaluation of the contribution and statistical significance of each predictor. We used the set of 380 statistically significant predictors as a learning model for the regression neural network that was trained using the PISCES30 dataset. When using the same dataset and metrics for benchmarking, our method outperformed all other methods reported in the literature for the CB513 nonredundant dataset (for the PBs, Q16 = 81.01%, and for the DSSP, Q3 = 85.99% and Q8 = 79.35%). |
format | Online Article Text |
id | pubmed-10648199 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-106481992023-10-27 Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information Milchevskiy, Yury V. Milchevskaya, Vladislava Y. Nikitin, Alexei M. Kravatsky, Yury V. Int J Mol Sci Article Protein structure prediction continues to pose multiple challenges despite outstanding progress that is largely attributable to the use of novel machine learning techniques. One of the widely used representations of local 3D structure—protein blocks (PBs)—can be treated in a similar way to secondary structure classes. Here, we present a new approach for predicting local conformation in terms of PB classes solely from amino acid sequences. We apply the RMSD metric to ensure unambiguous future 3D protein structure recovery. The selection of statistically assessed features is a key component of the proposed method. We suggest that ML input features should be created from the statistically significant predictors that are derived from the amino acids’ physicochemical properties and the resolved structures’ statistics. The statistical significance of the suggested features was assessed using a stepwise regression analysis that permitted the evaluation of the contribution and statistical significance of each predictor. We used the set of 380 statistically significant predictors as a learning model for the regression neural network that was trained using the PISCES30 dataset. When using the same dataset and metrics for benchmarking, our method outperformed all other methods reported in the literature for the CB513 nonredundant dataset (for the PBs, Q16 = 81.01%, and for the DSSP, Q3 = 85.99% and Q8 = 79.35%). MDPI 2023-10-27 /pmc/articles/PMC10648199/ /pubmed/37958639 http://dx.doi.org/10.3390/ijms242115656 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Milchevskiy, Yury V. Milchevskaya, Vladislava Y. Nikitin, Alexei M. Kravatsky, Yury V. Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information |
title | Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information |
title_full | Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information |
title_fullStr | Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information |
title_full_unstemmed | Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information |
title_short | Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information |
title_sort | effective local and secondary protein structure prediction by combining a neural network-based approach with extensive feature design and selection without reliance on evolutionary information |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10648199/ https://www.ncbi.nlm.nih.gov/pubmed/37958639 http://dx.doi.org/10.3390/ijms242115656 |
work_keys_str_mv | AT milchevskiyyuryv effectivelocalandsecondaryproteinstructurepredictionbycombininganeuralnetworkbasedapproachwithextensivefeaturedesignandselectionwithoutrelianceonevolutionaryinformation AT milchevskayavladislavay effectivelocalandsecondaryproteinstructurepredictionbycombininganeuralnetworkbasedapproachwithextensivefeaturedesignandselectionwithoutrelianceonevolutionaryinformation AT nikitinalexeim effectivelocalandsecondaryproteinstructurepredictionbycombininganeuralnetworkbasedapproachwithextensivefeaturedesignandselectionwithoutrelianceonevolutionaryinformation AT kravatskyyuryv effectivelocalandsecondaryproteinstructurepredictionbycombininganeuralnetworkbasedapproachwithextensivefeaturedesignandselectionwithoutrelianceonevolutionaryinformation |