Cargando…
Heterogeneity Aware Random Forest for Drug Sensitivity Prediction
Samples collected in pharmacogenomics databases typically belong to various cancer types. For designing a drug sensitivity predictive model from such a database, a natural question arises whether a model trained on diverse inter-tumor heterogeneous samples will perform similar to a predictive model...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5595802/ https://www.ncbi.nlm.nih.gov/pubmed/28900181 http://dx.doi.org/10.1038/s41598-017-11665-4 |
_version_ | 1783263419934179328 |
---|---|
author | Rahman, Raziur Matlock, Kevin Ghosh, Souparno Pal, Ranadip |
author_facet | Rahman, Raziur Matlock, Kevin Ghosh, Souparno Pal, Ranadip |
author_sort | Rahman, Raziur |
collection | PubMed |
description | Samples collected in pharmacogenomics databases typically belong to various cancer types. For designing a drug sensitivity predictive model from such a database, a natural question arises whether a model trained on diverse inter-tumor heterogeneous samples will perform similar to a predictive model that takes into consideration the heterogeneity of the samples in model training and prediction. We explore this hypothesis and observe that ensemble model predictions obtained when cancer type is known out-perform predictions when that information is withheld even when the samples sizes for the former is considerably lower than the combined sample size. To incorporate the heterogeneity idea in the commonly used ensemble based predictive model of Random Forests, we propose Heterogeneity Aware Random Forests (HARF) that assigns weights to the trees based on the category of the sample. We treat heterogeneity as a latent class allocation problem and present a covariate free class allocation approach based on the distribution of leaf nodes of the model ensemble. Applications on CCLE and GDSC databases show that HARF outperforms traditional Random Forest when the average drug responses of cancer types are different. |
format | Online Article Text |
id | pubmed-5595802 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-55958022017-09-14 Heterogeneity Aware Random Forest for Drug Sensitivity Prediction Rahman, Raziur Matlock, Kevin Ghosh, Souparno Pal, Ranadip Sci Rep Article Samples collected in pharmacogenomics databases typically belong to various cancer types. For designing a drug sensitivity predictive model from such a database, a natural question arises whether a model trained on diverse inter-tumor heterogeneous samples will perform similar to a predictive model that takes into consideration the heterogeneity of the samples in model training and prediction. We explore this hypothesis and observe that ensemble model predictions obtained when cancer type is known out-perform predictions when that information is withheld even when the samples sizes for the former is considerably lower than the combined sample size. To incorporate the heterogeneity idea in the commonly used ensemble based predictive model of Random Forests, we propose Heterogeneity Aware Random Forests (HARF) that assigns weights to the trees based on the category of the sample. We treat heterogeneity as a latent class allocation problem and present a covariate free class allocation approach based on the distribution of leaf nodes of the model ensemble. Applications on CCLE and GDSC databases show that HARF outperforms traditional Random Forest when the average drug responses of cancer types are different. Nature Publishing Group UK 2017-09-12 /pmc/articles/PMC5595802/ /pubmed/28900181 http://dx.doi.org/10.1038/s41598-017-11665-4 Text en © The Author(s) 2017 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Rahman, Raziur Matlock, Kevin Ghosh, Souparno Pal, Ranadip Heterogeneity Aware Random Forest for Drug Sensitivity Prediction |
title | Heterogeneity Aware Random Forest for Drug Sensitivity Prediction |
title_full | Heterogeneity Aware Random Forest for Drug Sensitivity Prediction |
title_fullStr | Heterogeneity Aware Random Forest for Drug Sensitivity Prediction |
title_full_unstemmed | Heterogeneity Aware Random Forest for Drug Sensitivity Prediction |
title_short | Heterogeneity Aware Random Forest for Drug Sensitivity Prediction |
title_sort | heterogeneity aware random forest for drug sensitivity prediction |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5595802/ https://www.ncbi.nlm.nih.gov/pubmed/28900181 http://dx.doi.org/10.1038/s41598-017-11665-4 |
work_keys_str_mv | AT rahmanraziur heterogeneityawarerandomforestfordrugsensitivityprediction AT matlockkevin heterogeneityawarerandomforestfordrugsensitivityprediction AT ghoshsouparno heterogeneityawarerandomforestfordrugsensitivityprediction AT palranadip heterogeneityawarerandomforestfordrugsensitivityprediction |