Cargando…

QSAR based model for discriminating EGFR inhibitors and non-inhibitors using Random forest

BACKGROUND: Epidermal Growth Factor Receptor (EGFR) is a well-characterized cancer drug target. In the past, several QSAR models have been developed for predicting inhibition activity of molecules against EGFR. These models are useful to a limited set of molecules for a particular class like quinazo...

Descripción completa

Detalles Bibliográficos
Autores principales: Singh, Harinder, Singh, Sandeep, Singla, Deepak, Agarwal, Subhash M, Raghava, Gajendra P S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4372225/
https://www.ncbi.nlm.nih.gov/pubmed/25880749
http://dx.doi.org/10.1186/s13062-015-0046-9
Descripción
Sumario:BACKGROUND: Epidermal Growth Factor Receptor (EGFR) is a well-characterized cancer drug target. In the past, several QSAR models have been developed for predicting inhibition activity of molecules against EGFR. These models are useful to a limited set of molecules for a particular class like quinazoline-derivatives. In this study, an attempt has been made to develop prediction models on a large set of molecules (~3500 molecules) that include diverse scaffolds like quinazoline, pyrimidine, quinoline and indole. RESULTS: We train, test and validate our classification models on a dataset called EGFR10 that contains 508 inhibitors (having inhibition activity IC(50) less than 10 nM) and 2997 non-inhibitors. Our Random forest based model achieved maximum MCC 0.49 with accuracy 83.7% on a validation set using 881 PubChem fingerprints. In this study, frequency-based feature selection technique has been used to identify best fingerprints. It was observed that PubChem fingerprints FP380 (C(~O) (~O)), FP579 (O = C-C-C-C), FP388 (C(:C) (:N) (:N)) and FP 816 (ClC1CC(Br)CCC1) are more frequent in the inhibitors in comparison to non-inhibitors. In addition, we created different datasets namely EGFR100 containing inhibitors having IC(50) < 100 nM and EGFR1000 containing inhibitors having IC(50) < 1000 nM. We trained, test and validate our models on datasets EGFR100 and EGFR1000 datasets and achieved and maximum MCC 0.58 and 0.71 respectively. In addition, models were developed for predicting quinazoline and pyrimidine based EGFR inhibitors. CONCLUSIONS: In summary, models have been developed on a large set of molecules of various classes for discriminating EGFR inhibitors and non-inhibitors. These highly accurate prediction models can be used to design and discover novel EGFR inhibitors. In order to provide service to the scientific community, a web server/standalone EGFRpred also has been developed (http://crdd.osdd.net/oscadd/egfrpred/). REVIEWERS: This article was reviewed by Dr Murphy, Prof Wang and Dr. Eisenhaber. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13062-015-0046-9) contains supplementary material, which is available to authorized users.