Cargando…
Prediction of Compound Profiling Matrices, Part II: Relative Performance of Multitask Deep Learning and Random Forest Classification on the Basis of Varying Amounts of Training Data
[Image: see text] Currently, there is a high level of interest in deep learning and multitask learning in many scientific fields including the life sciences and chemistry. Herein, we investigate the performance of multitask deep neural networks (MT-DNNs) compared to random forest (RF) classification...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2018
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6175492/ https://www.ncbi.nlm.nih.gov/pubmed/30320286 http://dx.doi.org/10.1021/acsomega.8b01682 |
_version_ | 1783361525884387328 |
---|---|
author | Rodríguez-Pérez, Raquel Bajorath, Jürgen |
author_facet | Rodríguez-Pérez, Raquel Bajorath, Jürgen |
author_sort | Rodríguez-Pérez, Raquel |
collection | PubMed |
description | [Image: see text] Currently, there is a high level of interest in deep learning and multitask learning in many scientific fields including the life sciences and chemistry. Herein, we investigate the performance of multitask deep neural networks (MT-DNNs) compared to random forest (RF) classification, a standard method in machine learning, in predicting compound profiling experiments. Predictions were carried out on a large profiling matrix extracted from biological screening data. For model building, submatrices with varying data density of 5–100% were generated to investigate the influence of data sparseness on prediction performance. MT-DNN models were directly compared to RF models, and control calculations were also carried out using single-task DNNs (ST-DNNs). On the basis of compound recall, the performance of ST-DNN was consistently lower than that of the other methods. Compared to RF, MT-DNN models only yielded better prediction performance for individual assays in the profiling matrix when training data were very sparse. However, when the matrix density increased to at least 25–45%, per-assay RF models met or partly exceeded the prediction performance of MT-DNN models. When the average performances of RF and MT-DNN over the grid of all targets were compared, MT-DNN was slightly superior to RF, which was a likely consequence of multitask learning. Overall, there was no consistent advantage of MT-DNN over standard RF classification in predicting the results of compound profiling assays under varying conditions. In the presence of very sparse training data, prediction performance was limited. Under these challenging conditions, MT-DNN was the preferred approach. When more training data became available and prediction performance increased, RF performance was not inferior to MT-DNN. |
format | Online Article Text |
id | pubmed-6175492 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-61754922018-10-11 Prediction of Compound Profiling Matrices, Part II: Relative Performance of Multitask Deep Learning and Random Forest Classification on the Basis of Varying Amounts of Training Data Rodríguez-Pérez, Raquel Bajorath, Jürgen ACS Omega [Image: see text] Currently, there is a high level of interest in deep learning and multitask learning in many scientific fields including the life sciences and chemistry. Herein, we investigate the performance of multitask deep neural networks (MT-DNNs) compared to random forest (RF) classification, a standard method in machine learning, in predicting compound profiling experiments. Predictions were carried out on a large profiling matrix extracted from biological screening data. For model building, submatrices with varying data density of 5–100% were generated to investigate the influence of data sparseness on prediction performance. MT-DNN models were directly compared to RF models, and control calculations were also carried out using single-task DNNs (ST-DNNs). On the basis of compound recall, the performance of ST-DNN was consistently lower than that of the other methods. Compared to RF, MT-DNN models only yielded better prediction performance for individual assays in the profiling matrix when training data were very sparse. However, when the matrix density increased to at least 25–45%, per-assay RF models met or partly exceeded the prediction performance of MT-DNN models. When the average performances of RF and MT-DNN over the grid of all targets were compared, MT-DNN was slightly superior to RF, which was a likely consequence of multitask learning. Overall, there was no consistent advantage of MT-DNN over standard RF classification in predicting the results of compound profiling assays under varying conditions. In the presence of very sparse training data, prediction performance was limited. Under these challenging conditions, MT-DNN was the preferred approach. When more training data became available and prediction performance increased, RF performance was not inferior to MT-DNN. American Chemical Society 2018-09-27 /pmc/articles/PMC6175492/ /pubmed/30320286 http://dx.doi.org/10.1021/acsomega.8b01682 Text en Copyright © 2018 American Chemical Society This is an open access article published under an ACS AuthorChoice License (http://pubs.acs.org/page/policy/authorchoice_termsofuse.html) , which permits copying and redistribution of the article or any adaptations for non-commercial purposes. |
spellingShingle | Rodríguez-Pérez, Raquel Bajorath, Jürgen Prediction of Compound Profiling Matrices, Part II: Relative Performance of Multitask Deep Learning and Random Forest Classification on the Basis of Varying Amounts of Training Data |
title | Prediction of Compound Profiling Matrices, Part II:
Relative Performance of Multitask Deep Learning and Random Forest
Classification on the Basis of Varying Amounts of Training Data |
title_full | Prediction of Compound Profiling Matrices, Part II:
Relative Performance of Multitask Deep Learning and Random Forest
Classification on the Basis of Varying Amounts of Training Data |
title_fullStr | Prediction of Compound Profiling Matrices, Part II:
Relative Performance of Multitask Deep Learning and Random Forest
Classification on the Basis of Varying Amounts of Training Data |
title_full_unstemmed | Prediction of Compound Profiling Matrices, Part II:
Relative Performance of Multitask Deep Learning and Random Forest
Classification on the Basis of Varying Amounts of Training Data |
title_short | Prediction of Compound Profiling Matrices, Part II:
Relative Performance of Multitask Deep Learning and Random Forest
Classification on the Basis of Varying Amounts of Training Data |
title_sort | prediction of compound profiling matrices, part ii:
relative performance of multitask deep learning and random forest
classification on the basis of varying amounts of training data |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6175492/ https://www.ncbi.nlm.nih.gov/pubmed/30320286 http://dx.doi.org/10.1021/acsomega.8b01682 |
work_keys_str_mv | AT rodriguezperezraquel predictionofcompoundprofilingmatricespartiirelativeperformanceofmultitaskdeeplearningandrandomforestclassificationonthebasisofvaryingamountsoftrainingdata AT bajorathjurgen predictionofcompoundprofilingmatricespartiirelativeperformanceofmultitaskdeeplearningandrandomforestclassificationonthebasisofvaryingamountsoftrainingdata |