Comprehensive ensemble in QSAR prediction for drug discovery
BACKGROUND: Quantitative structure-activity relationship (QSAR) is a computational modeling method for revealing relationships between structural properties of chemical compounds and biological activities. QSAR modeling is essential for drug discovery, but it has many constraints. Ensemble-based mac...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6815455/ https://www.ncbi.nlm.nih.gov/pubmed/31655545 http://dx.doi.org/10.1186/s12859-019-3135-4 |
_version_ | 1783463184860971008 |
---|---|
author | Kwon, Sunyoung Bae, Ho Jo, Jeonghee Yoon, Sungroh |
author_facet | Kwon, Sunyoung Bae, Ho Jo, Jeonghee Yoon, Sungroh |
author_sort | Kwon, Sunyoung |
collection | PubMed |
description | BACKGROUND: Quantitative structure-activity relationship (QSAR) is a computational modeling method for revealing relationships between structural properties of chemical compounds and biological activities. QSAR modeling is essential for drug discovery, but it has many constraints. Ensemble-based machine learning approaches have been used to overcome constraints and obtain reliable predictions. Ensemble learning builds a set of diversified models and combines them. However, the most prevalent approach random forest and other ensemble approaches in QSAR prediction limit their model diversity to a single subject. RESULTS: The proposed ensemble method consistently outperformed thirteen individual models on 19 bioassay datasets and demonstrated superiority over other ensemble approaches that are limited to a single subject. The comprehensive ensemble method is publicly available at http://data.snu.ac.kr/QSAR/. CONCLUSIONS: We propose a comprehensive ensemble method that builds multi-subject diversified models and combines them through second-level meta-learning. In addition, we propose an end-to-end neural network-based individual classifier that can automatically extract sequential features from a simplified molecular-input line-entry system (SMILES). The proposed individual models did not show impressive results as a single model, but it was considered the most important predictor when combined, according to the interpretation of the meta-learning. |
format | Online Article Text |
id | pubmed-6815455 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-68154552019-10-31 Comprehensive ensemble in QSAR prediction for drug discovery Kwon, Sunyoung Bae, Ho Jo, Jeonghee Yoon, Sungroh BMC Bioinformatics Methodology Article BACKGROUND: Quantitative structure-activity relationship (QSAR) is a computational modeling method for revealing relationships between structural properties of chemical compounds and biological activities. QSAR modeling is essential for drug discovery, but it has many constraints. Ensemble-based machine learning approaches have been used to overcome constraints and obtain reliable predictions. Ensemble learning builds a set of diversified models and combines them. However, the most prevalent approach random forest and other ensemble approaches in QSAR prediction limit their model diversity to a single subject. RESULTS: The proposed ensemble method consistently outperformed thirteen individual models on 19 bioassay datasets and demonstrated superiority over other ensemble approaches that are limited to a single subject. The comprehensive ensemble method is publicly available at http://data.snu.ac.kr/QSAR/. CONCLUSIONS: We propose a comprehensive ensemble method that builds multi-subject diversified models and combines them through second-level meta-learning. In addition, we propose an end-to-end neural network-based individual classifier that can automatically extract sequential features from a simplified molecular-input line-entry system (SMILES). The proposed individual models did not show impressive results as a single model, but it was considered the most important predictor when combined, according to the interpretation of the meta-learning. BioMed Central 2019-10-26 /pmc/articles/PMC6815455/ /pubmed/31655545 http://dx.doi.org/10.1186/s12859-019-3135-4 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Kwon, Sunyoung Bae, Ho Jo, Jeonghee Yoon, Sungroh Comprehensive ensemble in QSAR prediction for drug discovery |
title | Comprehensive ensemble in QSAR prediction for drug discovery |
title_full | Comprehensive ensemble in QSAR prediction for drug discovery |
title_fullStr | Comprehensive ensemble in QSAR prediction for drug discovery |
title_full_unstemmed | Comprehensive ensemble in QSAR prediction for drug discovery |
title_short | Comprehensive ensemble in QSAR prediction for drug discovery |
title_sort | comprehensive ensemble in qsar prediction for drug discovery |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6815455/ https://www.ncbi.nlm.nih.gov/pubmed/31655545 http://dx.doi.org/10.1186/s12859-019-3135-4 |
work_keys_str_mv | AT kwonsunyoung comprehensiveensembleinqsarpredictionfordrugdiscovery AT baeho comprehensiveensembleinqsarpredictionfordrugdiscovery AT jojeonghee comprehensiveensembleinqsarpredictionfordrugdiscovery AT yoonsungroh comprehensiveensembleinqsarpredictionfordrugdiscovery |