Cargando…

Prediction of Compound Profiling Matrices, Part II: Relative Performance of Multitask Deep Learning and Random Forest Classification on the Basis of Varying Amounts of Training Data

[Image: see text] Currently, there is a high level of interest in deep learning and multitask learning in many scientific fields including the life sciences and chemistry. Herein, we investigate the performance of multitask deep neural networks (MT-DNNs) compared to random forest (RF) classification...

Descripción completa

Detalles Bibliográficos
Autores principales: Rodríguez-Pérez, Raquel, Bajorath, Jürgen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2018
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6175492/
https://www.ncbi.nlm.nih.gov/pubmed/30320286
http://dx.doi.org/10.1021/acsomega.8b01682
_version_ 1783361525884387328
author Rodríguez-Pérez, Raquel
Bajorath, Jürgen
author_facet Rodríguez-Pérez, Raquel
Bajorath, Jürgen
author_sort Rodríguez-Pérez, Raquel
collection PubMed
description [Image: see text] Currently, there is a high level of interest in deep learning and multitask learning in many scientific fields including the life sciences and chemistry. Herein, we investigate the performance of multitask deep neural networks (MT-DNNs) compared to random forest (RF) classification, a standard method in machine learning, in predicting compound profiling experiments. Predictions were carried out on a large profiling matrix extracted from biological screening data. For model building, submatrices with varying data density of 5–100% were generated to investigate the influence of data sparseness on prediction performance. MT-DNN models were directly compared to RF models, and control calculations were also carried out using single-task DNNs (ST-DNNs). On the basis of compound recall, the performance of ST-DNN was consistently lower than that of the other methods. Compared to RF, MT-DNN models only yielded better prediction performance for individual assays in the profiling matrix when training data were very sparse. However, when the matrix density increased to at least 25–45%, per-assay RF models met or partly exceeded the prediction performance of MT-DNN models. When the average performances of RF and MT-DNN over the grid of all targets were compared, MT-DNN was slightly superior to RF, which was a likely consequence of multitask learning. Overall, there was no consistent advantage of MT-DNN over standard RF classification in predicting the results of compound profiling assays under varying conditions. In the presence of very sparse training data, prediction performance was limited. Under these challenging conditions, MT-DNN was the preferred approach. When more training data became available and prediction performance increased, RF performance was not inferior to MT-DNN.
format Online
Article
Text
id pubmed-6175492
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-61754922018-10-11 Prediction of Compound Profiling Matrices, Part II: Relative Performance of Multitask Deep Learning and Random Forest Classification on the Basis of Varying Amounts of Training Data Rodríguez-Pérez, Raquel Bajorath, Jürgen ACS Omega [Image: see text] Currently, there is a high level of interest in deep learning and multitask learning in many scientific fields including the life sciences and chemistry. Herein, we investigate the performance of multitask deep neural networks (MT-DNNs) compared to random forest (RF) classification, a standard method in machine learning, in predicting compound profiling experiments. Predictions were carried out on a large profiling matrix extracted from biological screening data. For model building, submatrices with varying data density of 5–100% were generated to investigate the influence of data sparseness on prediction performance. MT-DNN models were directly compared to RF models, and control calculations were also carried out using single-task DNNs (ST-DNNs). On the basis of compound recall, the performance of ST-DNN was consistently lower than that of the other methods. Compared to RF, MT-DNN models only yielded better prediction performance for individual assays in the profiling matrix when training data were very sparse. However, when the matrix density increased to at least 25–45%, per-assay RF models met or partly exceeded the prediction performance of MT-DNN models. When the average performances of RF and MT-DNN over the grid of all targets were compared, MT-DNN was slightly superior to RF, which was a likely consequence of multitask learning. Overall, there was no consistent advantage of MT-DNN over standard RF classification in predicting the results of compound profiling assays under varying conditions. In the presence of very sparse training data, prediction performance was limited. Under these challenging conditions, MT-DNN was the preferred approach. When more training data became available and prediction performance increased, RF performance was not inferior to MT-DNN. American Chemical Society 2018-09-27 /pmc/articles/PMC6175492/ /pubmed/30320286 http://dx.doi.org/10.1021/acsomega.8b01682 Text en Copyright © 2018 American Chemical Society This is an open access article published under an ACS AuthorChoice License (http://pubs.acs.org/page/policy/authorchoice_termsofuse.html) , which permits copying and redistribution of the article or any adaptations for non-commercial purposes.
spellingShingle Rodríguez-Pérez, Raquel
Bajorath, Jürgen
Prediction of Compound Profiling Matrices, Part II: Relative Performance of Multitask Deep Learning and Random Forest Classification on the Basis of Varying Amounts of Training Data
title Prediction of Compound Profiling Matrices, Part II: Relative Performance of Multitask Deep Learning and Random Forest Classification on the Basis of Varying Amounts of Training Data
title_full Prediction of Compound Profiling Matrices, Part II: Relative Performance of Multitask Deep Learning and Random Forest Classification on the Basis of Varying Amounts of Training Data
title_fullStr Prediction of Compound Profiling Matrices, Part II: Relative Performance of Multitask Deep Learning and Random Forest Classification on the Basis of Varying Amounts of Training Data
title_full_unstemmed Prediction of Compound Profiling Matrices, Part II: Relative Performance of Multitask Deep Learning and Random Forest Classification on the Basis of Varying Amounts of Training Data
title_short Prediction of Compound Profiling Matrices, Part II: Relative Performance of Multitask Deep Learning and Random Forest Classification on the Basis of Varying Amounts of Training Data
title_sort prediction of compound profiling matrices, part ii: relative performance of multitask deep learning and random forest classification on the basis of varying amounts of training data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6175492/
https://www.ncbi.nlm.nih.gov/pubmed/30320286
http://dx.doi.org/10.1021/acsomega.8b01682
work_keys_str_mv AT rodriguezperezraquel predictionofcompoundprofilingmatricespartiirelativeperformanceofmultitaskdeeplearningandrandomforestclassificationonthebasisofvaryingamountsoftrainingdata
AT bajorathjurgen predictionofcompoundprofilingmatricespartiirelativeperformanceofmultitaskdeeplearningandrandomforestclassificationonthebasisofvaryingamountsoftrainingdata