Cargando…
QSAR based model for discriminating EGFR inhibitors and non-inhibitors using Random forest
BACKGROUND: Epidermal Growth Factor Receptor (EGFR) is a well-characterized cancer drug target. In the past, several QSAR models have been developed for predicting inhibition activity of molecules against EGFR. These models are useful to a limited set of molecules for a particular class like quinazo...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4372225/ https://www.ncbi.nlm.nih.gov/pubmed/25880749 http://dx.doi.org/10.1186/s13062-015-0046-9 |
_version_ | 1782363143251427328 |
---|---|
author | Singh, Harinder Singh, Sandeep Singla, Deepak Agarwal, Subhash M Raghava, Gajendra P S |
author_facet | Singh, Harinder Singh, Sandeep Singla, Deepak Agarwal, Subhash M Raghava, Gajendra P S |
author_sort | Singh, Harinder |
collection | PubMed |
description | BACKGROUND: Epidermal Growth Factor Receptor (EGFR) is a well-characterized cancer drug target. In the past, several QSAR models have been developed for predicting inhibition activity of molecules against EGFR. These models are useful to a limited set of molecules for a particular class like quinazoline-derivatives. In this study, an attempt has been made to develop prediction models on a large set of molecules (~3500 molecules) that include diverse scaffolds like quinazoline, pyrimidine, quinoline and indole. RESULTS: We train, test and validate our classification models on a dataset called EGFR10 that contains 508 inhibitors (having inhibition activity IC(50) less than 10 nM) and 2997 non-inhibitors. Our Random forest based model achieved maximum MCC 0.49 with accuracy 83.7% on a validation set using 881 PubChem fingerprints. In this study, frequency-based feature selection technique has been used to identify best fingerprints. It was observed that PubChem fingerprints FP380 (C(~O) (~O)), FP579 (O = C-C-C-C), FP388 (C(:C) (:N) (:N)) and FP 816 (ClC1CC(Br)CCC1) are more frequent in the inhibitors in comparison to non-inhibitors. In addition, we created different datasets namely EGFR100 containing inhibitors having IC(50) < 100 nM and EGFR1000 containing inhibitors having IC(50) < 1000 nM. We trained, test and validate our models on datasets EGFR100 and EGFR1000 datasets and achieved and maximum MCC 0.58 and 0.71 respectively. In addition, models were developed for predicting quinazoline and pyrimidine based EGFR inhibitors. CONCLUSIONS: In summary, models have been developed on a large set of molecules of various classes for discriminating EGFR inhibitors and non-inhibitors. These highly accurate prediction models can be used to design and discover novel EGFR inhibitors. In order to provide service to the scientific community, a web server/standalone EGFRpred also has been developed (http://crdd.osdd.net/oscadd/egfrpred/). REVIEWERS: This article was reviewed by Dr Murphy, Prof Wang and Dr. Eisenhaber. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13062-015-0046-9) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4372225 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43722252015-03-25 QSAR based model for discriminating EGFR inhibitors and non-inhibitors using Random forest Singh, Harinder Singh, Sandeep Singla, Deepak Agarwal, Subhash M Raghava, Gajendra P S Biol Direct Research BACKGROUND: Epidermal Growth Factor Receptor (EGFR) is a well-characterized cancer drug target. In the past, several QSAR models have been developed for predicting inhibition activity of molecules against EGFR. These models are useful to a limited set of molecules for a particular class like quinazoline-derivatives. In this study, an attempt has been made to develop prediction models on a large set of molecules (~3500 molecules) that include diverse scaffolds like quinazoline, pyrimidine, quinoline and indole. RESULTS: We train, test and validate our classification models on a dataset called EGFR10 that contains 508 inhibitors (having inhibition activity IC(50) less than 10 nM) and 2997 non-inhibitors. Our Random forest based model achieved maximum MCC 0.49 with accuracy 83.7% on a validation set using 881 PubChem fingerprints. In this study, frequency-based feature selection technique has been used to identify best fingerprints. It was observed that PubChem fingerprints FP380 (C(~O) (~O)), FP579 (O = C-C-C-C), FP388 (C(:C) (:N) (:N)) and FP 816 (ClC1CC(Br)CCC1) are more frequent in the inhibitors in comparison to non-inhibitors. In addition, we created different datasets namely EGFR100 containing inhibitors having IC(50) < 100 nM and EGFR1000 containing inhibitors having IC(50) < 1000 nM. We trained, test and validate our models on datasets EGFR100 and EGFR1000 datasets and achieved and maximum MCC 0.58 and 0.71 respectively. In addition, models were developed for predicting quinazoline and pyrimidine based EGFR inhibitors. CONCLUSIONS: In summary, models have been developed on a large set of molecules of various classes for discriminating EGFR inhibitors and non-inhibitors. These highly accurate prediction models can be used to design and discover novel EGFR inhibitors. In order to provide service to the scientific community, a web server/standalone EGFRpred also has been developed (http://crdd.osdd.net/oscadd/egfrpred/). REVIEWERS: This article was reviewed by Dr Murphy, Prof Wang and Dr. Eisenhaber. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13062-015-0046-9) contains supplementary material, which is available to authorized users. BioMed Central 2015-03-25 /pmc/articles/PMC4372225/ /pubmed/25880749 http://dx.doi.org/10.1186/s13062-015-0046-9 Text en © Singh et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Singh, Harinder Singh, Sandeep Singla, Deepak Agarwal, Subhash M Raghava, Gajendra P S QSAR based model for discriminating EGFR inhibitors and non-inhibitors using Random forest |
title | QSAR based model for discriminating EGFR inhibitors and non-inhibitors using Random forest |
title_full | QSAR based model for discriminating EGFR inhibitors and non-inhibitors using Random forest |
title_fullStr | QSAR based model for discriminating EGFR inhibitors and non-inhibitors using Random forest |
title_full_unstemmed | QSAR based model for discriminating EGFR inhibitors and non-inhibitors using Random forest |
title_short | QSAR based model for discriminating EGFR inhibitors and non-inhibitors using Random forest |
title_sort | qsar based model for discriminating egfr inhibitors and non-inhibitors using random forest |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4372225/ https://www.ncbi.nlm.nih.gov/pubmed/25880749 http://dx.doi.org/10.1186/s13062-015-0046-9 |
work_keys_str_mv | AT singhharinder qsarbasedmodelfordiscriminatingegfrinhibitorsandnoninhibitorsusingrandomforest AT singhsandeep qsarbasedmodelfordiscriminatingegfrinhibitorsandnoninhibitorsusingrandomforest AT singladeepak qsarbasedmodelfordiscriminatingegfrinhibitorsandnoninhibitorsusingrandomforest AT agarwalsubhashm qsarbasedmodelfordiscriminatingegfrinhibitorsandnoninhibitorsusingrandomforest AT raghavagajendraps qsarbasedmodelfordiscriminatingegfrinhibitorsandnoninhibitorsusingrandomforest |