Cargando…

Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery

High throughput technologies in genomics enable the analysis of small alterations in gene expression levels. Patterns of such deviations are an important starting point for the discovery and verification of new biomarker candidates. Identifying such patterns is a challenging task that requires sophi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Netzer, Michael, Baumgartner, Christian, Baumgarten, Daniel
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9645616/ https://www.ncbi.nlm.nih.gov/pubmed/36350811 http://dx.doi.org/10.1371/journal.pone.0276607

_version_	1784827000349786112
author	Netzer, Michael Baumgartner, Christian Baumgarten, Daniel
author_facet	Netzer, Michael Baumgartner, Christian Baumgarten, Daniel
author_sort	Netzer, Michael
collection	PubMed
description	High throughput technologies in genomics enable the analysis of small alterations in gene expression levels. Patterns of such deviations are an important starting point for the discovery and verification of new biomarker candidates. Identifying such patterns is a challenging task that requires sophisticated machine learning approaches. Currently, there are a variety of classification models, and a common approach is to compare the performance and select the best one for a given classification problem. Since the association between the features of a data set and the performance of a particular classification method is still not fully understood, the main contribution of this work is to provide a new methodology for predicting the prediction results of different classifiers in the field of biomarker discovery. We propose here a three-steps computational workflow that includes an analysis of the data set characteristics, the calculation of the classification accuracy and, finally, the prediction of the resulting classification error. The experiments were carried out on synthetic and microarray datasets. Using this method, we showed that the predictability strongly depends on the discriminatory ability of the features, e.g., sets of genes, in two or multi-class datasets. If a dataset has a certain discriminatory ability, this method enables prediction of the classification performance before applying a learning model. Thus, our results contribute to a better understanding of the relationship between dataset characteristics and the corresponding performance of a machine learning method, and suggest the optimal classification method for a given dataset based on its discriminatory ability.
format	Online Article Text
id	pubmed-9645616
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-96456162022-11-15 Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery Netzer, Michael Baumgartner, Christian Baumgarten, Daniel PLoS One Research Article High throughput technologies in genomics enable the analysis of small alterations in gene expression levels. Patterns of such deviations are an important starting point for the discovery and verification of new biomarker candidates. Identifying such patterns is a challenging task that requires sophisticated machine learning approaches. Currently, there are a variety of classification models, and a common approach is to compare the performance and select the best one for a given classification problem. Since the association between the features of a data set and the performance of a particular classification method is still not fully understood, the main contribution of this work is to provide a new methodology for predicting the prediction results of different classifiers in the field of biomarker discovery. We propose here a three-steps computational workflow that includes an analysis of the data set characteristics, the calculation of the classification accuracy and, finally, the prediction of the resulting classification error. The experiments were carried out on synthetic and microarray datasets. Using this method, we showed that the predictability strongly depends on the discriminatory ability of the features, e.g., sets of genes, in two or multi-class datasets. If a dataset has a certain discriminatory ability, this method enables prediction of the classification performance before applying a learning model. Thus, our results contribute to a better understanding of the relationship between dataset characteristics and the corresponding performance of a machine learning method, and suggest the optimal classification method for a given dataset based on its discriminatory ability. Public Library of Science 2022-11-09 /pmc/articles/PMC9645616/ /pubmed/36350811 http://dx.doi.org/10.1371/journal.pone.0276607 Text en © 2022 Netzer et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Netzer, Michael Baumgartner, Christian Baumgarten, Daniel Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery
title	Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery
title_full	Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery
title_fullStr	Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery
title_full_unstemmed	Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery
title_short	Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery
title_sort	predicting prediction: a systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9645616/ https://www.ncbi.nlm.nih.gov/pubmed/36350811 http://dx.doi.org/10.1371/journal.pone.0276607
work_keys_str_mv	AT netzermichael predictingpredictionasystematicworkflowtoanalyzefactorsaffectingtheclassificationperformanceingenomicbiomarkerdiscovery AT baumgartnerchristian predictingpredictionasystematicworkflowtoanalyzefactorsaffectingtheclassificationperformanceingenomicbiomarkerdiscovery AT baumgartendaniel predictingpredictionasystematicworkflowtoanalyzefactorsaffectingtheclassificationperformanceingenomicbiomarkerdiscovery

Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery

Ejemplares similares