Cargando…

Predicting Conserved Water Molecules in Binding Sites of Proteins Using Machine Learning Methods and Combining Features

Water molecules play an important role in many biological processes in terms of stabilizing protein structures, assisting protein folding, and improving binding affinity. It is well known that, due to the impacts of various environmental factors, it is difficult to identify the conserved water molec...

Descripción completa

Detalles Bibliográficos
Autores principales: Xiao, Wei, Ren, Juhui, Hao, Jutao, Wang, Haoyu, Li, Yuhao, Lin, Liangzhao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9550495/
https://www.ncbi.nlm.nih.gov/pubmed/36226242
http://dx.doi.org/10.1155/2022/5104464
_version_ 1784805899684020224
author Xiao, Wei
Ren, Juhui
Hao, Jutao
Wang, Haoyu
Li, Yuhao
Lin, Liangzhao
author_facet Xiao, Wei
Ren, Juhui
Hao, Jutao
Wang, Haoyu
Li, Yuhao
Lin, Liangzhao
author_sort Xiao, Wei
collection PubMed
description Water molecules play an important role in many biological processes in terms of stabilizing protein structures, assisting protein folding, and improving binding affinity. It is well known that, due to the impacts of various environmental factors, it is difficult to identify the conserved water molecules (CWMs) from free water molecules (FWMs) directly as CWMs are normally deeply embedded in proteins and form strong hydrogen bonds with surrounding polar groups. To circumvent this difficulty, in this work, the abundance of spatial structure information and physicochemical properties of water molecules in proteins inspires us to adopt machine learning methods for identifying the CWMs. Therefore, in this study, a machine learning framework to identify the CWMs in the binding sites of the proteins was presented. First, by analyzing water molecules' physicochemical properties and spatial structure information, six features (i.e., atom density, hydrophilicity, hydrophobicity, solvent-accessible surface area, temperature B-factors, and mobility) were extracted. Those features were further analyzed and combined to reach a higher CWM identification rate. As a result, an optimal feature combination was determined. Based on this optimal combination, seven different machine learning models (including support vector machine (SVM), K-nearest neighbor (KNN), decision tree (DT), logistic regression (LR), discriminant analysis (DA), naïve Bayes (NB), and ensemble learning (EL)) were evaluated for their abilities in identifying two categories of water molecules, i.e., CWMs and FWMs. It showed that the EL model was the desired prediction model due to its comprehensive advantages. Furthermore, the presented methodology was validated through a case study of crystal 3skh and extensively compared with Dowser++. The prediction performance showed that the optimal feature combination and the desired EL model in our method could achieve satisfactory prediction accuracy in identifying CWMs from FWMs in the proteins' binding sites.
format Online
Article
Text
id pubmed-9550495
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-95504952022-10-11 Predicting Conserved Water Molecules in Binding Sites of Proteins Using Machine Learning Methods and Combining Features Xiao, Wei Ren, Juhui Hao, Jutao Wang, Haoyu Li, Yuhao Lin, Liangzhao Comput Math Methods Med Research Article Water molecules play an important role in many biological processes in terms of stabilizing protein structures, assisting protein folding, and improving binding affinity. It is well known that, due to the impacts of various environmental factors, it is difficult to identify the conserved water molecules (CWMs) from free water molecules (FWMs) directly as CWMs are normally deeply embedded in proteins and form strong hydrogen bonds with surrounding polar groups. To circumvent this difficulty, in this work, the abundance of spatial structure information and physicochemical properties of water molecules in proteins inspires us to adopt machine learning methods for identifying the CWMs. Therefore, in this study, a machine learning framework to identify the CWMs in the binding sites of the proteins was presented. First, by analyzing water molecules' physicochemical properties and spatial structure information, six features (i.e., atom density, hydrophilicity, hydrophobicity, solvent-accessible surface area, temperature B-factors, and mobility) were extracted. Those features were further analyzed and combined to reach a higher CWM identification rate. As a result, an optimal feature combination was determined. Based on this optimal combination, seven different machine learning models (including support vector machine (SVM), K-nearest neighbor (KNN), decision tree (DT), logistic regression (LR), discriminant analysis (DA), naïve Bayes (NB), and ensemble learning (EL)) were evaluated for their abilities in identifying two categories of water molecules, i.e., CWMs and FWMs. It showed that the EL model was the desired prediction model due to its comprehensive advantages. Furthermore, the presented methodology was validated through a case study of crystal 3skh and extensively compared with Dowser++. The prediction performance showed that the optimal feature combination and the desired EL model in our method could achieve satisfactory prediction accuracy in identifying CWMs from FWMs in the proteins' binding sites. Hindawi 2022-10-03 /pmc/articles/PMC9550495/ /pubmed/36226242 http://dx.doi.org/10.1155/2022/5104464 Text en Copyright © 2022 Wei Xiao et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Xiao, Wei
Ren, Juhui
Hao, Jutao
Wang, Haoyu
Li, Yuhao
Lin, Liangzhao
Predicting Conserved Water Molecules in Binding Sites of Proteins Using Machine Learning Methods and Combining Features
title Predicting Conserved Water Molecules in Binding Sites of Proteins Using Machine Learning Methods and Combining Features
title_full Predicting Conserved Water Molecules in Binding Sites of Proteins Using Machine Learning Methods and Combining Features
title_fullStr Predicting Conserved Water Molecules in Binding Sites of Proteins Using Machine Learning Methods and Combining Features
title_full_unstemmed Predicting Conserved Water Molecules in Binding Sites of Proteins Using Machine Learning Methods and Combining Features
title_short Predicting Conserved Water Molecules in Binding Sites of Proteins Using Machine Learning Methods and Combining Features
title_sort predicting conserved water molecules in binding sites of proteins using machine learning methods and combining features
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9550495/
https://www.ncbi.nlm.nih.gov/pubmed/36226242
http://dx.doi.org/10.1155/2022/5104464
work_keys_str_mv AT xiaowei predictingconservedwatermoleculesinbindingsitesofproteinsusingmachinelearningmethodsandcombiningfeatures
AT renjuhui predictingconservedwatermoleculesinbindingsitesofproteinsusingmachinelearningmethodsandcombiningfeatures
AT haojutao predictingconservedwatermoleculesinbindingsitesofproteinsusingmachinelearningmethodsandcombiningfeatures
AT wanghaoyu predictingconservedwatermoleculesinbindingsitesofproteinsusingmachinelearningmethodsandcombiningfeatures
AT liyuhao predictingconservedwatermoleculesinbindingsitesofproteinsusingmachinelearningmethodsandcombiningfeatures
AT linliangzhao predictingconservedwatermoleculesinbindingsitesofproteinsusingmachinelearningmethodsandcombiningfeatures