Cargando…
Predicting Conserved Water Molecules in Binding Sites of Proteins Using Machine Learning Methods and Combining Features
Water molecules play an important role in many biological processes in terms of stabilizing protein structures, assisting protein folding, and improving binding affinity. It is well known that, due to the impacts of various environmental factors, it is difficult to identify the conserved water molec...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9550495/ https://www.ncbi.nlm.nih.gov/pubmed/36226242 http://dx.doi.org/10.1155/2022/5104464 |
_version_ | 1784805899684020224 |
---|---|
author | Xiao, Wei Ren, Juhui Hao, Jutao Wang, Haoyu Li, Yuhao Lin, Liangzhao |
author_facet | Xiao, Wei Ren, Juhui Hao, Jutao Wang, Haoyu Li, Yuhao Lin, Liangzhao |
author_sort | Xiao, Wei |
collection | PubMed |
description | Water molecules play an important role in many biological processes in terms of stabilizing protein structures, assisting protein folding, and improving binding affinity. It is well known that, due to the impacts of various environmental factors, it is difficult to identify the conserved water molecules (CWMs) from free water molecules (FWMs) directly as CWMs are normally deeply embedded in proteins and form strong hydrogen bonds with surrounding polar groups. To circumvent this difficulty, in this work, the abundance of spatial structure information and physicochemical properties of water molecules in proteins inspires us to adopt machine learning methods for identifying the CWMs. Therefore, in this study, a machine learning framework to identify the CWMs in the binding sites of the proteins was presented. First, by analyzing water molecules' physicochemical properties and spatial structure information, six features (i.e., atom density, hydrophilicity, hydrophobicity, solvent-accessible surface area, temperature B-factors, and mobility) were extracted. Those features were further analyzed and combined to reach a higher CWM identification rate. As a result, an optimal feature combination was determined. Based on this optimal combination, seven different machine learning models (including support vector machine (SVM), K-nearest neighbor (KNN), decision tree (DT), logistic regression (LR), discriminant analysis (DA), naïve Bayes (NB), and ensemble learning (EL)) were evaluated for their abilities in identifying two categories of water molecules, i.e., CWMs and FWMs. It showed that the EL model was the desired prediction model due to its comprehensive advantages. Furthermore, the presented methodology was validated through a case study of crystal 3skh and extensively compared with Dowser++. The prediction performance showed that the optimal feature combination and the desired EL model in our method could achieve satisfactory prediction accuracy in identifying CWMs from FWMs in the proteins' binding sites. |
format | Online Article Text |
id | pubmed-9550495 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-95504952022-10-11 Predicting Conserved Water Molecules in Binding Sites of Proteins Using Machine Learning Methods and Combining Features Xiao, Wei Ren, Juhui Hao, Jutao Wang, Haoyu Li, Yuhao Lin, Liangzhao Comput Math Methods Med Research Article Water molecules play an important role in many biological processes in terms of stabilizing protein structures, assisting protein folding, and improving binding affinity. It is well known that, due to the impacts of various environmental factors, it is difficult to identify the conserved water molecules (CWMs) from free water molecules (FWMs) directly as CWMs are normally deeply embedded in proteins and form strong hydrogen bonds with surrounding polar groups. To circumvent this difficulty, in this work, the abundance of spatial structure information and physicochemical properties of water molecules in proteins inspires us to adopt machine learning methods for identifying the CWMs. Therefore, in this study, a machine learning framework to identify the CWMs in the binding sites of the proteins was presented. First, by analyzing water molecules' physicochemical properties and spatial structure information, six features (i.e., atom density, hydrophilicity, hydrophobicity, solvent-accessible surface area, temperature B-factors, and mobility) were extracted. Those features were further analyzed and combined to reach a higher CWM identification rate. As a result, an optimal feature combination was determined. Based on this optimal combination, seven different machine learning models (including support vector machine (SVM), K-nearest neighbor (KNN), decision tree (DT), logistic regression (LR), discriminant analysis (DA), naïve Bayes (NB), and ensemble learning (EL)) were evaluated for their abilities in identifying two categories of water molecules, i.e., CWMs and FWMs. It showed that the EL model was the desired prediction model due to its comprehensive advantages. Furthermore, the presented methodology was validated through a case study of crystal 3skh and extensively compared with Dowser++. The prediction performance showed that the optimal feature combination and the desired EL model in our method could achieve satisfactory prediction accuracy in identifying CWMs from FWMs in the proteins' binding sites. Hindawi 2022-10-03 /pmc/articles/PMC9550495/ /pubmed/36226242 http://dx.doi.org/10.1155/2022/5104464 Text en Copyright © 2022 Wei Xiao et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Xiao, Wei Ren, Juhui Hao, Jutao Wang, Haoyu Li, Yuhao Lin, Liangzhao Predicting Conserved Water Molecules in Binding Sites of Proteins Using Machine Learning Methods and Combining Features |
title | Predicting Conserved Water Molecules in Binding Sites of Proteins Using Machine Learning Methods and Combining Features |
title_full | Predicting Conserved Water Molecules in Binding Sites of Proteins Using Machine Learning Methods and Combining Features |
title_fullStr | Predicting Conserved Water Molecules in Binding Sites of Proteins Using Machine Learning Methods and Combining Features |
title_full_unstemmed | Predicting Conserved Water Molecules in Binding Sites of Proteins Using Machine Learning Methods and Combining Features |
title_short | Predicting Conserved Water Molecules in Binding Sites of Proteins Using Machine Learning Methods and Combining Features |
title_sort | predicting conserved water molecules in binding sites of proteins using machine learning methods and combining features |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9550495/ https://www.ncbi.nlm.nih.gov/pubmed/36226242 http://dx.doi.org/10.1155/2022/5104464 |
work_keys_str_mv | AT xiaowei predictingconservedwatermoleculesinbindingsitesofproteinsusingmachinelearningmethodsandcombiningfeatures AT renjuhui predictingconservedwatermoleculesinbindingsitesofproteinsusingmachinelearningmethodsandcombiningfeatures AT haojutao predictingconservedwatermoleculesinbindingsitesofproteinsusingmachinelearningmethodsandcombiningfeatures AT wanghaoyu predictingconservedwatermoleculesinbindingsitesofproteinsusingmachinelearningmethodsandcombiningfeatures AT liyuhao predictingconservedwatermoleculesinbindingsitesofproteinsusingmachinelearningmethodsandcombiningfeatures AT linliangzhao predictingconservedwatermoleculesinbindingsitesofproteinsusingmachinelearningmethodsandcombiningfeatures |