Cargando…

A comparison of rule-based and centroid single-sample multiclass predictors for transcriptomic classification

MOTIVATION: Gene expression-based multiclass prediction, such as tumor subtyping, is a non-trivial bioinformatic problem. Most classifier methods operate by comparing expression levels relative to other samples. Methods that base predictions on the expression pattern within a sample have been propos...

Descripción completa

Detalles Bibliográficos
Autores principales:	Eriksson, Pontus, Marzouka, Nour-al-dain, Sjödahl, Gottfrid, Bernardo, Carina, Liedberg, Fredrik, Höglund, Mattias
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2021
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8796360/ https://www.ncbi.nlm.nih.gov/pubmed/34788787 http://dx.doi.org/10.1093/bioinformatics/btab763

_version_	1784641288330543104
author	Eriksson, Pontus Marzouka, Nour-al-dain Sjödahl, Gottfrid Bernardo, Carina Liedberg, Fredrik Höglund, Mattias
author_facet	Eriksson, Pontus Marzouka, Nour-al-dain Sjödahl, Gottfrid Bernardo, Carina Liedberg, Fredrik Höglund, Mattias
author_sort	Eriksson, Pontus
collection	PubMed
description	MOTIVATION: Gene expression-based multiclass prediction, such as tumor subtyping, is a non-trivial bioinformatic problem. Most classifier methods operate by comparing expression levels relative to other samples. Methods that base predictions on the expression pattern within a sample have been proposed as an alternative. As these methods are invariant to the cohort composition and can be applied to a sample in isolation, they can collectively be termed single sample predictors (SSP). Such predictors could potentially be used for preprocessing-free classification of new samples and be built to function across different expression platforms where proper batch and dataset normalization is challenging. Here, we evaluate the behavior of several multiclass SSPs based on binary gene-pair rules (k-Top Scoring Pairs, Absolute Intrinsic Molecular Subtyping and a new Random Forest approach) and compare them to centroids built with centered or raw expression values, with the criteria that an optimal predictor should have high accuracy, overcome differences in tumor purity, be robust across expression platforms and provide an informative prediction output score. RESULTS: We found that gene-pair-based SSPs showed excellent performance on many expression-based classification tasks. The three methods differed in prediction score output, handling of tied scores and behavior in low purity samples. The k-Top Scoring Pairs and Random Forest approach both achieved high classification accuracy while providing an informative prediction score. Although gene-pair-based SSPs have been touted as being cross-platform compatible (through training on mixed platform data), out-of-the-box compatibility with a new dataset remains a potential issue that warrants cohort-to-cohort verification. AVAILABILITY AND IMPLEMENTATION: Our R package ‘multiclassPairs’ (https://cran.r-project.org/package=multiclassPairs) (https://doi.org/10.1093/bioinformatics/btab088) is freely available and enables easy training, prediction, and visualization using the gene-pair rule-based Random Forest SSP method and provides additional multiclass functionalities to the switchBox k-Top-Scoring Pairs package. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-8796360
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-87963602022-01-31 A comparison of rule-based and centroid single-sample multiclass predictors for transcriptomic classification Eriksson, Pontus Marzouka, Nour-al-dain Sjödahl, Gottfrid Bernardo, Carina Liedberg, Fredrik Höglund, Mattias Bioinformatics Original Papers MOTIVATION: Gene expression-based multiclass prediction, such as tumor subtyping, is a non-trivial bioinformatic problem. Most classifier methods operate by comparing expression levels relative to other samples. Methods that base predictions on the expression pattern within a sample have been proposed as an alternative. As these methods are invariant to the cohort composition and can be applied to a sample in isolation, they can collectively be termed single sample predictors (SSP). Such predictors could potentially be used for preprocessing-free classification of new samples and be built to function across different expression platforms where proper batch and dataset normalization is challenging. Here, we evaluate the behavior of several multiclass SSPs based on binary gene-pair rules (k-Top Scoring Pairs, Absolute Intrinsic Molecular Subtyping and a new Random Forest approach) and compare them to centroids built with centered or raw expression values, with the criteria that an optimal predictor should have high accuracy, overcome differences in tumor purity, be robust across expression platforms and provide an informative prediction output score. RESULTS: We found that gene-pair-based SSPs showed excellent performance on many expression-based classification tasks. The three methods differed in prediction score output, handling of tied scores and behavior in low purity samples. The k-Top Scoring Pairs and Random Forest approach both achieved high classification accuracy while providing an informative prediction score. Although gene-pair-based SSPs have been touted as being cross-platform compatible (through training on mixed platform data), out-of-the-box compatibility with a new dataset remains a potential issue that warrants cohort-to-cohort verification. AVAILABILITY AND IMPLEMENTATION: Our R package ‘multiclassPairs’ (https://cran.r-project.org/package=multiclassPairs) (https://doi.org/10.1093/bioinformatics/btab088) is freely available and enables easy training, prediction, and visualization using the gene-pair rule-based Random Forest SSP method and provides additional multiclass functionalities to the switchBox k-Top-Scoring Pairs package. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-11-12 /pmc/articles/PMC8796360/ /pubmed/34788787 http://dx.doi.org/10.1093/bioinformatics/btab763 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Eriksson, Pontus Marzouka, Nour-al-dain Sjödahl, Gottfrid Bernardo, Carina Liedberg, Fredrik Höglund, Mattias A comparison of rule-based and centroid single-sample multiclass predictors for transcriptomic classification
title	A comparison of rule-based and centroid single-sample multiclass predictors for transcriptomic classification
title_full	A comparison of rule-based and centroid single-sample multiclass predictors for transcriptomic classification
title_fullStr	A comparison of rule-based and centroid single-sample multiclass predictors for transcriptomic classification
title_full_unstemmed	A comparison of rule-based and centroid single-sample multiclass predictors for transcriptomic classification
title_short	A comparison of rule-based and centroid single-sample multiclass predictors for transcriptomic classification
title_sort	comparison of rule-based and centroid single-sample multiclass predictors for transcriptomic classification
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8796360/ https://www.ncbi.nlm.nih.gov/pubmed/34788787 http://dx.doi.org/10.1093/bioinformatics/btab763
work_keys_str_mv	AT erikssonpontus acomparisonofrulebasedandcentroidsinglesamplemulticlasspredictorsfortranscriptomicclassification AT marzoukanouraldain acomparisonofrulebasedandcentroidsinglesamplemulticlasspredictorsfortranscriptomicclassification AT sjodahlgottfrid acomparisonofrulebasedandcentroidsinglesamplemulticlasspredictorsfortranscriptomicclassification AT bernardocarina acomparisonofrulebasedandcentroidsinglesamplemulticlasspredictorsfortranscriptomicclassification AT liedbergfredrik acomparisonofrulebasedandcentroidsinglesamplemulticlasspredictorsfortranscriptomicclassification AT hoglundmattias acomparisonofrulebasedandcentroidsinglesamplemulticlasspredictorsfortranscriptomicclassification AT erikssonpontus comparisonofrulebasedandcentroidsinglesamplemulticlasspredictorsfortranscriptomicclassification AT marzoukanouraldain comparisonofrulebasedandcentroidsinglesamplemulticlasspredictorsfortranscriptomicclassification AT sjodahlgottfrid comparisonofrulebasedandcentroidsinglesamplemulticlasspredictorsfortranscriptomicclassification AT bernardocarina comparisonofrulebasedandcentroidsinglesamplemulticlasspredictorsfortranscriptomicclassification AT liedbergfredrik comparisonofrulebasedandcentroidsinglesamplemulticlasspredictorsfortranscriptomicclassification AT hoglundmattias comparisonofrulebasedandcentroidsinglesamplemulticlasspredictorsfortranscriptomicclassification

A comparison of rule-based and centroid single-sample multiclass predictors for transcriptomic classification

Ejemplares similares