Cargando…

Integration of machine learning in 3D-QSAR CoMSIA models for the identification of lipid antioxidant peptides

The comparative molecular similarity indices analysis (CoMSIA) method is a widely used 3D-quantitative structure–activity relationship (QSAR) approach in the field of medicinal chemistry and drug design. However, relying solely on the Partial Least Square algorithm to build models using numerous CoM...

Descripción completa

Detalles Bibliográficos
Autores principales: Nha Tran, Thi Thanh, Thuan Tran, Thi Dieu, Thuy Bui, Thi Thu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society of Chemistry 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10654693/
https://www.ncbi.nlm.nih.gov/pubmed/38020021
http://dx.doi.org/10.1039/d3ra06690h
_version_ 1785147870388682752
author Nha Tran, Thi Thanh
Thuan Tran, Thi Dieu
Thuy Bui, Thi Thu
author_facet Nha Tran, Thi Thanh
Thuan Tran, Thi Dieu
Thuy Bui, Thi Thu
author_sort Nha Tran, Thi Thanh
collection PubMed
description The comparative molecular similarity indices analysis (CoMSIA) method is a widely used 3D-quantitative structure–activity relationship (QSAR) approach in the field of medicinal chemistry and drug design. However, relying solely on the Partial Least Square algorithm to build models using numerous CoMSIA indices has, in some cases, led to statistically underperforming models. This issue has also affected 3D-CoMSIA models constructed for the ferric thiocyanate (FTC) dataset from linoleic antioxidant measurements. In this study, a novel modeling routine has been developed incorporating various machine learning (ML) techniques to explore different options for feature selection, model fitting, and tuning algorithms with the ultimate goal of arriving at optimal 3D-CoMSIA models with high predictivity for the FTC activity. Recursive Feature Selection and SelectFromModel techniques were applied for feature selection, resulting in a significant improvement in model fitting and predictivity (R(2), R(CV)(2), and R(2)_test) of 24 estimators. However, these selection methods did not fully address the problem of overfitting and, in some instances, even exacerbated it. On the other hand, hyperparameter tuning for tree-based models resulted in dissimilar levels of model generalization for four tree-based models. GB-RFE coupled with GBR (hyperparameters: learning_rate = 0.01, max_depth = 2, n_estimators = 500, subsample = 0.5) was the only combination that effectively mitigated overfitting and demonstrated superior performance (R(CV)(2) of 0.690, R(2)_test of 0.759, and R(2) of 0.872) compared to the best linear model, PLS (with R(CV)(2) of 0.653, R(2)_test of 0.575, and R(2) of 0.755). Therefore, it was subsequently utilized to screen potential antioxidants among a range of Tryptophyllin L tripeptide fragments, leading to the synthesis and testing of three peptides: F-P-5Htp, F-P-W, and P-5Htp-L. These peptides exhibited promising activity levels, with FTC values of 4.2 ± 0.12, 4.4 ± 0.11, and 1.72 ± 0.15, respectively.
format Online
Article
Text
id pubmed-10654693
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher The Royal Society of Chemistry
record_format MEDLINE/PubMed
spelling pubmed-106546932023-11-17 Integration of machine learning in 3D-QSAR CoMSIA models for the identification of lipid antioxidant peptides Nha Tran, Thi Thanh Thuan Tran, Thi Dieu Thuy Bui, Thi Thu RSC Adv Chemistry The comparative molecular similarity indices analysis (CoMSIA) method is a widely used 3D-quantitative structure–activity relationship (QSAR) approach in the field of medicinal chemistry and drug design. However, relying solely on the Partial Least Square algorithm to build models using numerous CoMSIA indices has, in some cases, led to statistically underperforming models. This issue has also affected 3D-CoMSIA models constructed for the ferric thiocyanate (FTC) dataset from linoleic antioxidant measurements. In this study, a novel modeling routine has been developed incorporating various machine learning (ML) techniques to explore different options for feature selection, model fitting, and tuning algorithms with the ultimate goal of arriving at optimal 3D-CoMSIA models with high predictivity for the FTC activity. Recursive Feature Selection and SelectFromModel techniques were applied for feature selection, resulting in a significant improvement in model fitting and predictivity (R(2), R(CV)(2), and R(2)_test) of 24 estimators. However, these selection methods did not fully address the problem of overfitting and, in some instances, even exacerbated it. On the other hand, hyperparameter tuning for tree-based models resulted in dissimilar levels of model generalization for four tree-based models. GB-RFE coupled with GBR (hyperparameters: learning_rate = 0.01, max_depth = 2, n_estimators = 500, subsample = 0.5) was the only combination that effectively mitigated overfitting and demonstrated superior performance (R(CV)(2) of 0.690, R(2)_test of 0.759, and R(2) of 0.872) compared to the best linear model, PLS (with R(CV)(2) of 0.653, R(2)_test of 0.575, and R(2) of 0.755). Therefore, it was subsequently utilized to screen potential antioxidants among a range of Tryptophyllin L tripeptide fragments, leading to the synthesis and testing of three peptides: F-P-5Htp, F-P-W, and P-5Htp-L. These peptides exhibited promising activity levels, with FTC values of 4.2 ± 0.12, 4.4 ± 0.11, and 1.72 ± 0.15, respectively. The Royal Society of Chemistry 2023-11-17 /pmc/articles/PMC10654693/ /pubmed/38020021 http://dx.doi.org/10.1039/d3ra06690h Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by-nc/3.0/
spellingShingle Chemistry
Nha Tran, Thi Thanh
Thuan Tran, Thi Dieu
Thuy Bui, Thi Thu
Integration of machine learning in 3D-QSAR CoMSIA models for the identification of lipid antioxidant peptides
title Integration of machine learning in 3D-QSAR CoMSIA models for the identification of lipid antioxidant peptides
title_full Integration of machine learning in 3D-QSAR CoMSIA models for the identification of lipid antioxidant peptides
title_fullStr Integration of machine learning in 3D-QSAR CoMSIA models for the identification of lipid antioxidant peptides
title_full_unstemmed Integration of machine learning in 3D-QSAR CoMSIA models for the identification of lipid antioxidant peptides
title_short Integration of machine learning in 3D-QSAR CoMSIA models for the identification of lipid antioxidant peptides
title_sort integration of machine learning in 3d-qsar comsia models for the identification of lipid antioxidant peptides
topic Chemistry
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10654693/
https://www.ncbi.nlm.nih.gov/pubmed/38020021
http://dx.doi.org/10.1039/d3ra06690h
work_keys_str_mv AT nhatranthithanh integrationofmachinelearningin3dqsarcomsiamodelsfortheidentificationoflipidantioxidantpeptides
AT thuantranthidieu integrationofmachinelearningin3dqsarcomsiamodelsfortheidentificationoflipidantioxidantpeptides
AT thuybuithithu integrationofmachinelearningin3dqsarcomsiamodelsfortheidentificationoflipidantioxidantpeptides