Cargando…

Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery

High throughput technologies in genomics enable the analysis of small alterations in gene expression levels. Patterns of such deviations are an important starting point for the discovery and verification of new biomarker candidates. Identifying such patterns is a challenging task that requires sophi...

Descripción completa

Detalles Bibliográficos
Autores principales: Netzer, Michael, Baumgartner, Christian, Baumgarten, Daniel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9645616/
https://www.ncbi.nlm.nih.gov/pubmed/36350811
http://dx.doi.org/10.1371/journal.pone.0276607
_version_ 1784827000349786112
author Netzer, Michael
Baumgartner, Christian
Baumgarten, Daniel
author_facet Netzer, Michael
Baumgartner, Christian
Baumgarten, Daniel
author_sort Netzer, Michael
collection PubMed
description High throughput technologies in genomics enable the analysis of small alterations in gene expression levels. Patterns of such deviations are an important starting point for the discovery and verification of new biomarker candidates. Identifying such patterns is a challenging task that requires sophisticated machine learning approaches. Currently, there are a variety of classification models, and a common approach is to compare the performance and select the best one for a given classification problem. Since the association between the features of a data set and the performance of a particular classification method is still not fully understood, the main contribution of this work is to provide a new methodology for predicting the prediction results of different classifiers in the field of biomarker discovery. We propose here a three-steps computational workflow that includes an analysis of the data set characteristics, the calculation of the classification accuracy and, finally, the prediction of the resulting classification error. The experiments were carried out on synthetic and microarray datasets. Using this method, we showed that the predictability strongly depends on the discriminatory ability of the features, e.g., sets of genes, in two or multi-class datasets. If a dataset has a certain discriminatory ability, this method enables prediction of the classification performance before applying a learning model. Thus, our results contribute to a better understanding of the relationship between dataset characteristics and the corresponding performance of a machine learning method, and suggest the optimal classification method for a given dataset based on its discriminatory ability.
format Online
Article
Text
id pubmed-9645616
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-96456162022-11-15 Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery Netzer, Michael Baumgartner, Christian Baumgarten, Daniel PLoS One Research Article High throughput technologies in genomics enable the analysis of small alterations in gene expression levels. Patterns of such deviations are an important starting point for the discovery and verification of new biomarker candidates. Identifying such patterns is a challenging task that requires sophisticated machine learning approaches. Currently, there are a variety of classification models, and a common approach is to compare the performance and select the best one for a given classification problem. Since the association between the features of a data set and the performance of a particular classification method is still not fully understood, the main contribution of this work is to provide a new methodology for predicting the prediction results of different classifiers in the field of biomarker discovery. We propose here a three-steps computational workflow that includes an analysis of the data set characteristics, the calculation of the classification accuracy and, finally, the prediction of the resulting classification error. The experiments were carried out on synthetic and microarray datasets. Using this method, we showed that the predictability strongly depends on the discriminatory ability of the features, e.g., sets of genes, in two or multi-class datasets. If a dataset has a certain discriminatory ability, this method enables prediction of the classification performance before applying a learning model. Thus, our results contribute to a better understanding of the relationship between dataset characteristics and the corresponding performance of a machine learning method, and suggest the optimal classification method for a given dataset based on its discriminatory ability. Public Library of Science 2022-11-09 /pmc/articles/PMC9645616/ /pubmed/36350811 http://dx.doi.org/10.1371/journal.pone.0276607 Text en © 2022 Netzer et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Netzer, Michael
Baumgartner, Christian
Baumgarten, Daniel
Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery
title Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery
title_full Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery
title_fullStr Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery
title_full_unstemmed Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery
title_short Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery
title_sort predicting prediction: a systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9645616/
https://www.ncbi.nlm.nih.gov/pubmed/36350811
http://dx.doi.org/10.1371/journal.pone.0276607
work_keys_str_mv AT netzermichael predictingpredictionasystematicworkflowtoanalyzefactorsaffectingtheclassificationperformanceingenomicbiomarkerdiscovery
AT baumgartnerchristian predictingpredictionasystematicworkflowtoanalyzefactorsaffectingtheclassificationperformanceingenomicbiomarkerdiscovery
AT baumgartendaniel predictingpredictionasystematicworkflowtoanalyzefactorsaffectingtheclassificationperformanceingenomicbiomarkerdiscovery