Cargando…

The ability to classify patients based on gene-expression data varies by algorithm and performance metric

By classifying patients into subgroups, clinicians can provide more effective care than using a uniform approach for all patients. Such subgroups might include patients with a particular disease subtype, patients with a good (or poor) prognosis, or patients most (or least) likely to respond to a par...

Descripción completa

Detalles Bibliográficos
Autores principales:	Piccolo, Stephen R., Mecham, Avery, Golightly, Nathan P., Johnson, Jérémie L., Miller, Dustin B.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8942277/ https://www.ncbi.nlm.nih.gov/pubmed/35275931 http://dx.doi.org/10.1371/journal.pcbi.1009926

_version_	1784673272540954624
author	Piccolo, Stephen R. Mecham, Avery Golightly, Nathan P. Johnson, Jérémie L. Miller, Dustin B.
author_facet	Piccolo, Stephen R. Mecham, Avery Golightly, Nathan P. Johnson, Jérémie L. Miller, Dustin B.
author_sort	Piccolo, Stephen R.
collection	PubMed
description	By classifying patients into subgroups, clinicians can provide more effective care than using a uniform approach for all patients. Such subgroups might include patients with a particular disease subtype, patients with a good (or poor) prognosis, or patients most (or least) likely to respond to a particular therapy. Transcriptomic measurements reflect the downstream effects of genomic and epigenomic variations. However, high-throughput technologies generate thousands of measurements per patient, and complex dependencies exist among genes, so it may be infeasible to classify patients using traditional statistical models. Machine-learning classification algorithms can help with this problem. However, hundreds of classification algorithms exist—and most support diverse hyperparameters—so it is difficult for researchers to know which are optimal for gene-expression biomarkers. We performed a benchmark comparison, applying 52 classification algorithms to 50 gene-expression datasets (143 class variables). We evaluated algorithms that represent diverse machine-learning methodologies and have been implemented in general-purpose, open-source, machine-learning libraries. When available, we combined clinical predictors with gene-expression data. Additionally, we evaluated the effects of performing hyperparameter optimization and feature selection using nested cross validation. Kernel- and ensemble-based algorithms consistently outperformed other types of classification algorithms; however, even the top-performing algorithms performed poorly in some cases. Hyperparameter optimization and feature selection typically improved predictive performance, and univariate feature-selection algorithms typically outperformed more sophisticated methods. Together, our findings illustrate that algorithm performance varies considerably when other factors are held constant and thus that algorithm selection is a critical step in biomarker studies.
format	Online Article Text
id	pubmed-8942277
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-89422772022-03-24 The ability to classify patients based on gene-expression data varies by algorithm and performance metric Piccolo, Stephen R. Mecham, Avery Golightly, Nathan P. Johnson, Jérémie L. Miller, Dustin B. PLoS Comput Biol Research Article By classifying patients into subgroups, clinicians can provide more effective care than using a uniform approach for all patients. Such subgroups might include patients with a particular disease subtype, patients with a good (or poor) prognosis, or patients most (or least) likely to respond to a particular therapy. Transcriptomic measurements reflect the downstream effects of genomic and epigenomic variations. However, high-throughput technologies generate thousands of measurements per patient, and complex dependencies exist among genes, so it may be infeasible to classify patients using traditional statistical models. Machine-learning classification algorithms can help with this problem. However, hundreds of classification algorithms exist—and most support diverse hyperparameters—so it is difficult for researchers to know which are optimal for gene-expression biomarkers. We performed a benchmark comparison, applying 52 classification algorithms to 50 gene-expression datasets (143 class variables). We evaluated algorithms that represent diverse machine-learning methodologies and have been implemented in general-purpose, open-source, machine-learning libraries. When available, we combined clinical predictors with gene-expression data. Additionally, we evaluated the effects of performing hyperparameter optimization and feature selection using nested cross validation. Kernel- and ensemble-based algorithms consistently outperformed other types of classification algorithms; however, even the top-performing algorithms performed poorly in some cases. Hyperparameter optimization and feature selection typically improved predictive performance, and univariate feature-selection algorithms typically outperformed more sophisticated methods. Together, our findings illustrate that algorithm performance varies considerably when other factors are held constant and thus that algorithm selection is a critical step in biomarker studies. Public Library of Science 2022-03-11 /pmc/articles/PMC8942277/ /pubmed/35275931 http://dx.doi.org/10.1371/journal.pcbi.1009926 Text en © 2022 Piccolo et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Piccolo, Stephen R. Mecham, Avery Golightly, Nathan P. Johnson, Jérémie L. Miller, Dustin B. The ability to classify patients based on gene-expression data varies by algorithm and performance metric
title	The ability to classify patients based on gene-expression data varies by algorithm and performance metric
title_full	The ability to classify patients based on gene-expression data varies by algorithm and performance metric
title_fullStr	The ability to classify patients based on gene-expression data varies by algorithm and performance metric
title_full_unstemmed	The ability to classify patients based on gene-expression data varies by algorithm and performance metric
title_short	The ability to classify patients based on gene-expression data varies by algorithm and performance metric
title_sort	ability to classify patients based on gene-expression data varies by algorithm and performance metric
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8942277/ https://www.ncbi.nlm.nih.gov/pubmed/35275931 http://dx.doi.org/10.1371/journal.pcbi.1009926
work_keys_str_mv	AT piccolostephenr theabilitytoclassifypatientsbasedongeneexpressiondatavariesbyalgorithmandperformancemetric AT mechamavery theabilitytoclassifypatientsbasedongeneexpressiondatavariesbyalgorithmandperformancemetric AT golightlynathanp theabilitytoclassifypatientsbasedongeneexpressiondatavariesbyalgorithmandperformancemetric AT johnsonjeremiel theabilitytoclassifypatientsbasedongeneexpressiondatavariesbyalgorithmandperformancemetric AT millerdustinb theabilitytoclassifypatientsbasedongeneexpressiondatavariesbyalgorithmandperformancemetric AT piccolostephenr abilitytoclassifypatientsbasedongeneexpressiondatavariesbyalgorithmandperformancemetric AT mechamavery abilitytoclassifypatientsbasedongeneexpressiondatavariesbyalgorithmandperformancemetric AT golightlynathanp abilitytoclassifypatientsbasedongeneexpressiondatavariesbyalgorithmandperformancemetric AT johnsonjeremiel abilitytoclassifypatientsbasedongeneexpressiondatavariesbyalgorithmandperformancemetric AT millerdustinb abilitytoclassifypatientsbasedongeneexpressiondatavariesbyalgorithmandperformancemetric

The ability to classify patients based on gene-expression data varies by algorithm and performance metric

Ejemplares similares