Cargando…

QSAR based model for discriminating EGFR inhibitors and non-inhibitors using Random forest

BACKGROUND: Epidermal Growth Factor Receptor (EGFR) is a well-characterized cancer drug target. In the past, several QSAR models have been developed for predicting inhibition activity of molecules against EGFR. These models are useful to a limited set of molecules for a particular class like quinazo...

Descripción completa

Detalles Bibliográficos
Autores principales: Singh, Harinder, Singh, Sandeep, Singla, Deepak, Agarwal, Subhash M, Raghava, Gajendra P S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4372225/
https://www.ncbi.nlm.nih.gov/pubmed/25880749
http://dx.doi.org/10.1186/s13062-015-0046-9
_version_ 1782363143251427328
author Singh, Harinder
Singh, Sandeep
Singla, Deepak
Agarwal, Subhash M
Raghava, Gajendra P S
author_facet Singh, Harinder
Singh, Sandeep
Singla, Deepak
Agarwal, Subhash M
Raghava, Gajendra P S
author_sort Singh, Harinder
collection PubMed
description BACKGROUND: Epidermal Growth Factor Receptor (EGFR) is a well-characterized cancer drug target. In the past, several QSAR models have been developed for predicting inhibition activity of molecules against EGFR. These models are useful to a limited set of molecules for a particular class like quinazoline-derivatives. In this study, an attempt has been made to develop prediction models on a large set of molecules (~3500 molecules) that include diverse scaffolds like quinazoline, pyrimidine, quinoline and indole. RESULTS: We train, test and validate our classification models on a dataset called EGFR10 that contains 508 inhibitors (having inhibition activity IC(50) less than 10 nM) and 2997 non-inhibitors. Our Random forest based model achieved maximum MCC 0.49 with accuracy 83.7% on a validation set using 881 PubChem fingerprints. In this study, frequency-based feature selection technique has been used to identify best fingerprints. It was observed that PubChem fingerprints FP380 (C(~O) (~O)), FP579 (O = C-C-C-C), FP388 (C(:C) (:N) (:N)) and FP 816 (ClC1CC(Br)CCC1) are more frequent in the inhibitors in comparison to non-inhibitors. In addition, we created different datasets namely EGFR100 containing inhibitors having IC(50) < 100 nM and EGFR1000 containing inhibitors having IC(50) < 1000 nM. We trained, test and validate our models on datasets EGFR100 and EGFR1000 datasets and achieved and maximum MCC 0.58 and 0.71 respectively. In addition, models were developed for predicting quinazoline and pyrimidine based EGFR inhibitors. CONCLUSIONS: In summary, models have been developed on a large set of molecules of various classes for discriminating EGFR inhibitors and non-inhibitors. These highly accurate prediction models can be used to design and discover novel EGFR inhibitors. In order to provide service to the scientific community, a web server/standalone EGFRpred also has been developed (http://crdd.osdd.net/oscadd/egfrpred/). REVIEWERS: This article was reviewed by Dr Murphy, Prof Wang and Dr. Eisenhaber. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13062-015-0046-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4372225
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43722252015-03-25 QSAR based model for discriminating EGFR inhibitors and non-inhibitors using Random forest Singh, Harinder Singh, Sandeep Singla, Deepak Agarwal, Subhash M Raghava, Gajendra P S Biol Direct Research BACKGROUND: Epidermal Growth Factor Receptor (EGFR) is a well-characterized cancer drug target. In the past, several QSAR models have been developed for predicting inhibition activity of molecules against EGFR. These models are useful to a limited set of molecules for a particular class like quinazoline-derivatives. In this study, an attempt has been made to develop prediction models on a large set of molecules (~3500 molecules) that include diverse scaffolds like quinazoline, pyrimidine, quinoline and indole. RESULTS: We train, test and validate our classification models on a dataset called EGFR10 that contains 508 inhibitors (having inhibition activity IC(50) less than 10 nM) and 2997 non-inhibitors. Our Random forest based model achieved maximum MCC 0.49 with accuracy 83.7% on a validation set using 881 PubChem fingerprints. In this study, frequency-based feature selection technique has been used to identify best fingerprints. It was observed that PubChem fingerprints FP380 (C(~O) (~O)), FP579 (O = C-C-C-C), FP388 (C(:C) (:N) (:N)) and FP 816 (ClC1CC(Br)CCC1) are more frequent in the inhibitors in comparison to non-inhibitors. In addition, we created different datasets namely EGFR100 containing inhibitors having IC(50) < 100 nM and EGFR1000 containing inhibitors having IC(50) < 1000 nM. We trained, test and validate our models on datasets EGFR100 and EGFR1000 datasets and achieved and maximum MCC 0.58 and 0.71 respectively. In addition, models were developed for predicting quinazoline and pyrimidine based EGFR inhibitors. CONCLUSIONS: In summary, models have been developed on a large set of molecules of various classes for discriminating EGFR inhibitors and non-inhibitors. These highly accurate prediction models can be used to design and discover novel EGFR inhibitors. In order to provide service to the scientific community, a web server/standalone EGFRpred also has been developed (http://crdd.osdd.net/oscadd/egfrpred/). REVIEWERS: This article was reviewed by Dr Murphy, Prof Wang and Dr. Eisenhaber. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13062-015-0046-9) contains supplementary material, which is available to authorized users. BioMed Central 2015-03-25 /pmc/articles/PMC4372225/ /pubmed/25880749 http://dx.doi.org/10.1186/s13062-015-0046-9 Text en © Singh et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Singh, Harinder
Singh, Sandeep
Singla, Deepak
Agarwal, Subhash M
Raghava, Gajendra P S
QSAR based model for discriminating EGFR inhibitors and non-inhibitors using Random forest
title QSAR based model for discriminating EGFR inhibitors and non-inhibitors using Random forest
title_full QSAR based model for discriminating EGFR inhibitors and non-inhibitors using Random forest
title_fullStr QSAR based model for discriminating EGFR inhibitors and non-inhibitors using Random forest
title_full_unstemmed QSAR based model for discriminating EGFR inhibitors and non-inhibitors using Random forest
title_short QSAR based model for discriminating EGFR inhibitors and non-inhibitors using Random forest
title_sort qsar based model for discriminating egfr inhibitors and non-inhibitors using random forest
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4372225/
https://www.ncbi.nlm.nih.gov/pubmed/25880749
http://dx.doi.org/10.1186/s13062-015-0046-9
work_keys_str_mv AT singhharinder qsarbasedmodelfordiscriminatingegfrinhibitorsandnoninhibitorsusingrandomforest
AT singhsandeep qsarbasedmodelfordiscriminatingegfrinhibitorsandnoninhibitorsusingrandomforest
AT singladeepak qsarbasedmodelfordiscriminatingegfrinhibitorsandnoninhibitorsusingrandomforest
AT agarwalsubhashm qsarbasedmodelfordiscriminatingegfrinhibitorsandnoninhibitorsusingrandomforest
AT raghavagajendraps qsarbasedmodelfordiscriminatingegfrinhibitorsandnoninhibitorsusingrandomforest