Cargando…
Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery
High throughput technologies in genomics enable the analysis of small alterations in gene expression levels. Patterns of such deviations are an important starting point for the discovery and verification of new biomarker candidates. Identifying such patterns is a challenging task that requires sophi...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9645616/ https://www.ncbi.nlm.nih.gov/pubmed/36350811 http://dx.doi.org/10.1371/journal.pone.0276607 |
_version_ | 1784827000349786112 |
---|---|
author | Netzer, Michael Baumgartner, Christian Baumgarten, Daniel |
author_facet | Netzer, Michael Baumgartner, Christian Baumgarten, Daniel |
author_sort | Netzer, Michael |
collection | PubMed |
description | High throughput technologies in genomics enable the analysis of small alterations in gene expression levels. Patterns of such deviations are an important starting point for the discovery and verification of new biomarker candidates. Identifying such patterns is a challenging task that requires sophisticated machine learning approaches. Currently, there are a variety of classification models, and a common approach is to compare the performance and select the best one for a given classification problem. Since the association between the features of a data set and the performance of a particular classification method is still not fully understood, the main contribution of this work is to provide a new methodology for predicting the prediction results of different classifiers in the field of biomarker discovery. We propose here a three-steps computational workflow that includes an analysis of the data set characteristics, the calculation of the classification accuracy and, finally, the prediction of the resulting classification error. The experiments were carried out on synthetic and microarray datasets. Using this method, we showed that the predictability strongly depends on the discriminatory ability of the features, e.g., sets of genes, in two or multi-class datasets. If a dataset has a certain discriminatory ability, this method enables prediction of the classification performance before applying a learning model. Thus, our results contribute to a better understanding of the relationship between dataset characteristics and the corresponding performance of a machine learning method, and suggest the optimal classification method for a given dataset based on its discriminatory ability. |
format | Online Article Text |
id | pubmed-9645616 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-96456162022-11-15 Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery Netzer, Michael Baumgartner, Christian Baumgarten, Daniel PLoS One Research Article High throughput technologies in genomics enable the analysis of small alterations in gene expression levels. Patterns of such deviations are an important starting point for the discovery and verification of new biomarker candidates. Identifying such patterns is a challenging task that requires sophisticated machine learning approaches. Currently, there are a variety of classification models, and a common approach is to compare the performance and select the best one for a given classification problem. Since the association between the features of a data set and the performance of a particular classification method is still not fully understood, the main contribution of this work is to provide a new methodology for predicting the prediction results of different classifiers in the field of biomarker discovery. We propose here a three-steps computational workflow that includes an analysis of the data set characteristics, the calculation of the classification accuracy and, finally, the prediction of the resulting classification error. The experiments were carried out on synthetic and microarray datasets. Using this method, we showed that the predictability strongly depends on the discriminatory ability of the features, e.g., sets of genes, in two or multi-class datasets. If a dataset has a certain discriminatory ability, this method enables prediction of the classification performance before applying a learning model. Thus, our results contribute to a better understanding of the relationship between dataset characteristics and the corresponding performance of a machine learning method, and suggest the optimal classification method for a given dataset based on its discriminatory ability. Public Library of Science 2022-11-09 /pmc/articles/PMC9645616/ /pubmed/36350811 http://dx.doi.org/10.1371/journal.pone.0276607 Text en © 2022 Netzer et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Netzer, Michael Baumgartner, Christian Baumgarten, Daniel Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery |
title | Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery |
title_full | Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery |
title_fullStr | Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery |
title_full_unstemmed | Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery |
title_short | Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery |
title_sort | predicting prediction: a systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9645616/ https://www.ncbi.nlm.nih.gov/pubmed/36350811 http://dx.doi.org/10.1371/journal.pone.0276607 |
work_keys_str_mv | AT netzermichael predictingpredictionasystematicworkflowtoanalyzefactorsaffectingtheclassificationperformanceingenomicbiomarkerdiscovery AT baumgartnerchristian predictingpredictionasystematicworkflowtoanalyzefactorsaffectingtheclassificationperformanceingenomicbiomarkerdiscovery AT baumgartendaniel predictingpredictionasystematicworkflowtoanalyzefactorsaffectingtheclassificationperformanceingenomicbiomarkerdiscovery |