Cargando…

Learning curves for drug response prediction in cancer cell lines

BACKGROUND: Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whet...

Descripción completa

Detalles Bibliográficos
Autores principales: Partin, Alexander, Brettin, Thomas, Evrard, Yvonne A., Zhu, Yitan, Yoo, Hyunseung, Xia, Fangfang, Jiang, Songhao, Clyde, Austin, Shukla, Maulik, Fonstein, Michael, Doroshow, James H., Stevens, Rick L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8130157/
https://www.ncbi.nlm.nih.gov/pubmed/34001007
http://dx.doi.org/10.1186/s12859-021-04163-y
_version_ 1783694459542700032
author Partin, Alexander
Brettin, Thomas
Evrard, Yvonne A.
Zhu, Yitan
Yoo, Hyunseung
Xia, Fangfang
Jiang, Songhao
Clyde, Austin
Shukla, Maulik
Fonstein, Michael
Doroshow, James H.
Stevens, Rick L.
author_facet Partin, Alexander
Brettin, Thomas
Evrard, Yvonne A.
Zhu, Yitan
Yoo, Hyunseung
Xia, Fangfang
Jiang, Songhao
Clyde, Austin
Shukla, Maulik
Fonstein, Michael
Doroshow, James H.
Stevens, Rick L.
author_sort Partin, Alexander
collection PubMed
description BACKGROUND: Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data. METHODS: We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models. RESULTS: The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics. CONCLUSIONS: A fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies.
format Online
Article
Text
id pubmed-8130157
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-81301572021-05-18 Learning curves for drug response prediction in cancer cell lines Partin, Alexander Brettin, Thomas Evrard, Yvonne A. Zhu, Yitan Yoo, Hyunseung Xia, Fangfang Jiang, Songhao Clyde, Austin Shukla, Maulik Fonstein, Michael Doroshow, James H. Stevens, Rick L. BMC Bioinformatics Research Article BACKGROUND: Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data. METHODS: We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models. RESULTS: The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics. CONCLUSIONS: A fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies. BioMed Central 2021-05-17 /pmc/articles/PMC8130157/ /pubmed/34001007 http://dx.doi.org/10.1186/s12859-021-04163-y Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Partin, Alexander
Brettin, Thomas
Evrard, Yvonne A.
Zhu, Yitan
Yoo, Hyunseung
Xia, Fangfang
Jiang, Songhao
Clyde, Austin
Shukla, Maulik
Fonstein, Michael
Doroshow, James H.
Stevens, Rick L.
Learning curves for drug response prediction in cancer cell lines
title Learning curves for drug response prediction in cancer cell lines
title_full Learning curves for drug response prediction in cancer cell lines
title_fullStr Learning curves for drug response prediction in cancer cell lines
title_full_unstemmed Learning curves for drug response prediction in cancer cell lines
title_short Learning curves for drug response prediction in cancer cell lines
title_sort learning curves for drug response prediction in cancer cell lines
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8130157/
https://www.ncbi.nlm.nih.gov/pubmed/34001007
http://dx.doi.org/10.1186/s12859-021-04163-y
work_keys_str_mv AT partinalexander learningcurvesfordrugresponsepredictionincancercelllines
AT brettinthomas learningcurvesfordrugresponsepredictionincancercelllines
AT evrardyvonnea learningcurvesfordrugresponsepredictionincancercelllines
AT zhuyitan learningcurvesfordrugresponsepredictionincancercelllines
AT yoohyunseung learningcurvesfordrugresponsepredictionincancercelllines
AT xiafangfang learningcurvesfordrugresponsepredictionincancercelllines
AT jiangsonghao learningcurvesfordrugresponsepredictionincancercelllines
AT clydeaustin learningcurvesfordrugresponsepredictionincancercelllines
AT shuklamaulik learningcurvesfordrugresponsepredictionincancercelllines
AT fonsteinmichael learningcurvesfordrugresponsepredictionincancercelllines
AT doroshowjamesh learningcurvesfordrugresponsepredictionincancercelllines
AT stevensrickl learningcurvesfordrugresponsepredictionincancercelllines