Cargando…

Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data

Background: Selected gene mutations are routinely used to guide the selection of cancer drugs for a given patient tumour. Large pharmacogenomic data sets, such as those by Genomics of Drug Sensitivity in Cancer (GDSC) consortium, were introduced to discover more of these single-gene markers of drug...

Descripción completa

Detalles Bibliográficos
Autores principales:	Nguyen, Linh, Dang, Cuong C, Ballester, Pedro J.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	F1000Research 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5310525/ https://www.ncbi.nlm.nih.gov/pubmed/28299173 http://dx.doi.org/10.12688/f1000research.10529.2

_version_	1782507888684564480
author	Nguyen, Linh Dang, Cuong C Ballester, Pedro J.
author_facet	Nguyen, Linh Dang, Cuong C Ballester, Pedro J.
author_sort	Nguyen, Linh
collection	PubMed
description	Background: Selected gene mutations are routinely used to guide the selection of cancer drugs for a given patient tumour. Large pharmacogenomic data sets, such as those by Genomics of Drug Sensitivity in Cancer (GDSC) consortium, were introduced to discover more of these single-gene markers of drug sensitivity. Very recently, machine learning regression has been used to investigate how well cancer cell line sensitivity to drugs is predicted depending on the type of molecular profile. The latter has revealed that gene expression data is the most predictive profile in the pan-cancer setting. However, no study to date has exploited GDSC data to systematically compare the performance of machine learning models based on multi-gene expression data against that of widely-used single-gene markers based on genomics data. Methods: Here we present this systematic comparison using Random Forest (RF) classifiers exploiting the expression levels of 13,321 genes and an average of 501 tested cell lines per drug. To account for time-dependent batch effects in IC (50) measurements, we employ independent test sets generated with more recent GDSC data than that used to train the predictors and show that this is a more realistic validation than standard k-fold cross-validation. Results and Discussion: Across 127 GDSC drugs, our results show that the single-gene markers unveiled by the MANOVA analysis tend to achieve higher precision than these RF-based multi-gene models, at the cost of generally having a poor recall (i.e. correctly detecting only a small part of the cell lines sensitive to the drug). Regarding overall classification performance, about two thirds of the drugs are better predicted by the multi-gene RF classifiers. Among the drugs with the most predictive of these models, we found pyrimethamine, sunitinib and 17-AAG. Conclusions: Thanks to this unbiased validation, we now know that this type of models can predict in vitro tumour response to some of these drugs. These models can thus be further investigated on in vivo tumour models. R code to facilitate the construction of alternative machine learning models and their validation in the presented benchmark is available at http://ballester.marseille.inserm.fr/gdsc.transcriptomicDatav2.tar.gz.
format	Online Article Text
id	pubmed-5310525
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	F1000Research
record_format	MEDLINE/PubMed
spelling	pubmed-53105252017-03-14 Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data Nguyen, Linh Dang, Cuong C Ballester, Pedro J. F1000Res Research Article Background: Selected gene mutations are routinely used to guide the selection of cancer drugs for a given patient tumour. Large pharmacogenomic data sets, such as those by Genomics of Drug Sensitivity in Cancer (GDSC) consortium, were introduced to discover more of these single-gene markers of drug sensitivity. Very recently, machine learning regression has been used to investigate how well cancer cell line sensitivity to drugs is predicted depending on the type of molecular profile. The latter has revealed that gene expression data is the most predictive profile in the pan-cancer setting. However, no study to date has exploited GDSC data to systematically compare the performance of machine learning models based on multi-gene expression data against that of widely-used single-gene markers based on genomics data. Methods: Here we present this systematic comparison using Random Forest (RF) classifiers exploiting the expression levels of 13,321 genes and an average of 501 tested cell lines per drug. To account for time-dependent batch effects in IC (50) measurements, we employ independent test sets generated with more recent GDSC data than that used to train the predictors and show that this is a more realistic validation than standard k-fold cross-validation. Results and Discussion: Across 127 GDSC drugs, our results show that the single-gene markers unveiled by the MANOVA analysis tend to achieve higher precision than these RF-based multi-gene models, at the cost of generally having a poor recall (i.e. correctly detecting only a small part of the cell lines sensitive to the drug). Regarding overall classification performance, about two thirds of the drugs are better predicted by the multi-gene RF classifiers. Among the drugs with the most predictive of these models, we found pyrimethamine, sunitinib and 17-AAG. Conclusions: Thanks to this unbiased validation, we now know that this type of models can predict in vitro tumour response to some of these drugs. These models can thus be further investigated on in vivo tumour models. R code to facilitate the construction of alternative machine learning models and their validation in the presented benchmark is available at http://ballester.marseille.inserm.fr/gdsc.transcriptomicDatav2.tar.gz. F1000Research 2017-03-14 /pmc/articles/PMC5310525/ /pubmed/28299173 http://dx.doi.org/10.12688/f1000research.10529.2 Text en Copyright: © 2017 Nguyen L et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Nguyen, Linh Dang, Cuong C Ballester, Pedro J. Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data
title	Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data
title_full	Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data
title_fullStr	Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data
title_full_unstemmed	Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data
title_short	Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data
title_sort	systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5310525/ https://www.ncbi.nlm.nih.gov/pubmed/28299173 http://dx.doi.org/10.12688/f1000research.10529.2
work_keys_str_mv	AT nguyenlinh systematicassessmentofmultigenepredictorsofpancancercelllinesensitivitytodrugsexploitinggeneexpressiondata AT dangcuongc systematicassessmentofmultigenepredictorsofpancancercelllinesensitivitytodrugsexploitinggeneexpressiondata AT ballesterpedroj systematicassessmentofmultigenepredictorsofpancancercelllinesensitivitytodrugsexploitinggeneexpressiondata

Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data

Ejemplares similares