Cargando…

Assessing Outlier Probabilities in Transcriptomics Data When Evaluating a Classifier

Outliers in the training or test set used to fit and evaluate a classifier on transcriptomics data can considerably change the estimated performance of the model. Hence, an either too weak or a too optimistic accuracy is then reported and the estimated model performance cannot be reproduced on indep...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kircher, Magdalena, Säurich, Josefin, Selle, Michael, Jung, Klaus
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9956321/ https://www.ncbi.nlm.nih.gov/pubmed/36833313 http://dx.doi.org/10.3390/genes14020387

_version_	1784894563877388288
author	Kircher, Magdalena Säurich, Josefin Selle, Michael Jung, Klaus
author_facet	Kircher, Magdalena Säurich, Josefin Selle, Michael Jung, Klaus
author_sort	Kircher, Magdalena
collection	PubMed
description	Outliers in the training or test set used to fit and evaluate a classifier on transcriptomics data can considerably change the estimated performance of the model. Hence, an either too weak or a too optimistic accuracy is then reported and the estimated model performance cannot be reproduced on independent data. It is then also doubtful whether a classifier qualifies for clinical usage. We estimate classifier performances in simulated gene expression data with artificial outliers and in two real-world datasets. As a new approach, we use two outlier detection methods within a bootstrap procedure to estimate the outlier probability for each sample and evaluate classifiers before and after outlier removal by means of cross-validation. We found that the removal of outliers changed the classification performance notably. For the most part, removing outliers improved the classification results. Taking into account the fact that there are various, sometimes unclear reasons for a sample to be an outlier, we strongly advocate to always report the performance of a transcriptomics classifier with and without outliers in training and test data. This provides a more diverse picture of a classifier’s performance and prevents reporting models that later turn out to be not applicable for clinical diagnoses.
format	Online Article Text
id	pubmed-9956321
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-99563212023-02-25 Assessing Outlier Probabilities in Transcriptomics Data When Evaluating a Classifier Kircher, Magdalena Säurich, Josefin Selle, Michael Jung, Klaus Genes (Basel) Article Outliers in the training or test set used to fit and evaluate a classifier on transcriptomics data can considerably change the estimated performance of the model. Hence, an either too weak or a too optimistic accuracy is then reported and the estimated model performance cannot be reproduced on independent data. It is then also doubtful whether a classifier qualifies for clinical usage. We estimate classifier performances in simulated gene expression data with artificial outliers and in two real-world datasets. As a new approach, we use two outlier detection methods within a bootstrap procedure to estimate the outlier probability for each sample and evaluate classifiers before and after outlier removal by means of cross-validation. We found that the removal of outliers changed the classification performance notably. For the most part, removing outliers improved the classification results. Taking into account the fact that there are various, sometimes unclear reasons for a sample to be an outlier, we strongly advocate to always report the performance of a transcriptomics classifier with and without outliers in training and test data. This provides a more diverse picture of a classifier’s performance and prevents reporting models that later turn out to be not applicable for clinical diagnoses. MDPI 2023-02-01 /pmc/articles/PMC9956321/ /pubmed/36833313 http://dx.doi.org/10.3390/genes14020387 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Kircher, Magdalena Säurich, Josefin Selle, Michael Jung, Klaus Assessing Outlier Probabilities in Transcriptomics Data When Evaluating a Classifier
title	Assessing Outlier Probabilities in Transcriptomics Data When Evaluating a Classifier
title_full	Assessing Outlier Probabilities in Transcriptomics Data When Evaluating a Classifier
title_fullStr	Assessing Outlier Probabilities in Transcriptomics Data When Evaluating a Classifier
title_full_unstemmed	Assessing Outlier Probabilities in Transcriptomics Data When Evaluating a Classifier
title_short	Assessing Outlier Probabilities in Transcriptomics Data When Evaluating a Classifier
title_sort	assessing outlier probabilities in transcriptomics data when evaluating a classifier
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9956321/ https://www.ncbi.nlm.nih.gov/pubmed/36833313 http://dx.doi.org/10.3390/genes14020387
work_keys_str_mv	AT kirchermagdalena assessingoutlierprobabilitiesintranscriptomicsdatawhenevaluatingaclassifier AT saurichjosefin assessingoutlierprobabilitiesintranscriptomicsdatawhenevaluatingaclassifier AT sellemichael assessingoutlierprobabilitiesintranscriptomicsdatawhenevaluatingaclassifier AT jungklaus assessingoutlierprobabilitiesintranscriptomicsdatawhenevaluatingaclassifier

Assessing Outlier Probabilities in Transcriptomics Data When Evaluating a Classifier

Ejemplares similares