Cargando…
Integration of machine learning in 3D-QSAR CoMSIA models for the identification of lipid antioxidant peptides
The comparative molecular similarity indices analysis (CoMSIA) method is a widely used 3D-quantitative structure–activity relationship (QSAR) approach in the field of medicinal chemistry and drug design. However, relying solely on the Partial Least Square algorithm to build models using numerous CoM...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The Royal Society of Chemistry
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10654693/ https://www.ncbi.nlm.nih.gov/pubmed/38020021 http://dx.doi.org/10.1039/d3ra06690h |
_version_ | 1785147870388682752 |
---|---|
author | Nha Tran, Thi Thanh Thuan Tran, Thi Dieu Thuy Bui, Thi Thu |
author_facet | Nha Tran, Thi Thanh Thuan Tran, Thi Dieu Thuy Bui, Thi Thu |
author_sort | Nha Tran, Thi Thanh |
collection | PubMed |
description | The comparative molecular similarity indices analysis (CoMSIA) method is a widely used 3D-quantitative structure–activity relationship (QSAR) approach in the field of medicinal chemistry and drug design. However, relying solely on the Partial Least Square algorithm to build models using numerous CoMSIA indices has, in some cases, led to statistically underperforming models. This issue has also affected 3D-CoMSIA models constructed for the ferric thiocyanate (FTC) dataset from linoleic antioxidant measurements. In this study, a novel modeling routine has been developed incorporating various machine learning (ML) techniques to explore different options for feature selection, model fitting, and tuning algorithms with the ultimate goal of arriving at optimal 3D-CoMSIA models with high predictivity for the FTC activity. Recursive Feature Selection and SelectFromModel techniques were applied for feature selection, resulting in a significant improvement in model fitting and predictivity (R(2), R(CV)(2), and R(2)_test) of 24 estimators. However, these selection methods did not fully address the problem of overfitting and, in some instances, even exacerbated it. On the other hand, hyperparameter tuning for tree-based models resulted in dissimilar levels of model generalization for four tree-based models. GB-RFE coupled with GBR (hyperparameters: learning_rate = 0.01, max_depth = 2, n_estimators = 500, subsample = 0.5) was the only combination that effectively mitigated overfitting and demonstrated superior performance (R(CV)(2) of 0.690, R(2)_test of 0.759, and R(2) of 0.872) compared to the best linear model, PLS (with R(CV)(2) of 0.653, R(2)_test of 0.575, and R(2) of 0.755). Therefore, it was subsequently utilized to screen potential antioxidants among a range of Tryptophyllin L tripeptide fragments, leading to the synthesis and testing of three peptides: F-P-5Htp, F-P-W, and P-5Htp-L. These peptides exhibited promising activity levels, with FTC values of 4.2 ± 0.12, 4.4 ± 0.11, and 1.72 ± 0.15, respectively. |
format | Online Article Text |
id | pubmed-10654693 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | The Royal Society of Chemistry |
record_format | MEDLINE/PubMed |
spelling | pubmed-106546932023-11-17 Integration of machine learning in 3D-QSAR CoMSIA models for the identification of lipid antioxidant peptides Nha Tran, Thi Thanh Thuan Tran, Thi Dieu Thuy Bui, Thi Thu RSC Adv Chemistry The comparative molecular similarity indices analysis (CoMSIA) method is a widely used 3D-quantitative structure–activity relationship (QSAR) approach in the field of medicinal chemistry and drug design. However, relying solely on the Partial Least Square algorithm to build models using numerous CoMSIA indices has, in some cases, led to statistically underperforming models. This issue has also affected 3D-CoMSIA models constructed for the ferric thiocyanate (FTC) dataset from linoleic antioxidant measurements. In this study, a novel modeling routine has been developed incorporating various machine learning (ML) techniques to explore different options for feature selection, model fitting, and tuning algorithms with the ultimate goal of arriving at optimal 3D-CoMSIA models with high predictivity for the FTC activity. Recursive Feature Selection and SelectFromModel techniques were applied for feature selection, resulting in a significant improvement in model fitting and predictivity (R(2), R(CV)(2), and R(2)_test) of 24 estimators. However, these selection methods did not fully address the problem of overfitting and, in some instances, even exacerbated it. On the other hand, hyperparameter tuning for tree-based models resulted in dissimilar levels of model generalization for four tree-based models. GB-RFE coupled with GBR (hyperparameters: learning_rate = 0.01, max_depth = 2, n_estimators = 500, subsample = 0.5) was the only combination that effectively mitigated overfitting and demonstrated superior performance (R(CV)(2) of 0.690, R(2)_test of 0.759, and R(2) of 0.872) compared to the best linear model, PLS (with R(CV)(2) of 0.653, R(2)_test of 0.575, and R(2) of 0.755). Therefore, it was subsequently utilized to screen potential antioxidants among a range of Tryptophyllin L tripeptide fragments, leading to the synthesis and testing of three peptides: F-P-5Htp, F-P-W, and P-5Htp-L. These peptides exhibited promising activity levels, with FTC values of 4.2 ± 0.12, 4.4 ± 0.11, and 1.72 ± 0.15, respectively. The Royal Society of Chemistry 2023-11-17 /pmc/articles/PMC10654693/ /pubmed/38020021 http://dx.doi.org/10.1039/d3ra06690h Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by-nc/3.0/ |
spellingShingle | Chemistry Nha Tran, Thi Thanh Thuan Tran, Thi Dieu Thuy Bui, Thi Thu Integration of machine learning in 3D-QSAR CoMSIA models for the identification of lipid antioxidant peptides |
title | Integration of machine learning in 3D-QSAR CoMSIA models for the identification of lipid antioxidant peptides |
title_full | Integration of machine learning in 3D-QSAR CoMSIA models for the identification of lipid antioxidant peptides |
title_fullStr | Integration of machine learning in 3D-QSAR CoMSIA models for the identification of lipid antioxidant peptides |
title_full_unstemmed | Integration of machine learning in 3D-QSAR CoMSIA models for the identification of lipid antioxidant peptides |
title_short | Integration of machine learning in 3D-QSAR CoMSIA models for the identification of lipid antioxidant peptides |
title_sort | integration of machine learning in 3d-qsar comsia models for the identification of lipid antioxidant peptides |
topic | Chemistry |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10654693/ https://www.ncbi.nlm.nih.gov/pubmed/38020021 http://dx.doi.org/10.1039/d3ra06690h |
work_keys_str_mv | AT nhatranthithanh integrationofmachinelearningin3dqsarcomsiamodelsfortheidentificationoflipidantioxidantpeptides AT thuantranthidieu integrationofmachinelearningin3dqsarcomsiamodelsfortheidentificationoflipidantioxidantpeptides AT thuybuithithu integrationofmachinelearningin3dqsarcomsiamodelsfortheidentificationoflipidantioxidantpeptides |