Cargando…
Building gene expression profile classifiers with a simple and efficient rejection option in R
BACKGROUND: The collection of gene expression profiles from DNA microarrays and their analysis with pattern recognition algorithms is a powerful technology applied to several biological problems. Common pattern recognition systems classify samples assigning them to a set of known classes. However, i...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3278843/ https://www.ncbi.nlm.nih.gov/pubmed/22373214 http://dx.doi.org/10.1186/1471-2105-12-S13-S3 |
_version_ | 1782223614493327360 |
---|---|
author | Benso, Alfredo Di Carlo, Stefano Politano, Gianfranco Savino, Alessandro Hafeezurrehman, Hafeez |
author_facet | Benso, Alfredo Di Carlo, Stefano Politano, Gianfranco Savino, Alessandro Hafeezurrehman, Hafeez |
author_sort | Benso, Alfredo |
collection | PubMed |
description | BACKGROUND: The collection of gene expression profiles from DNA microarrays and their analysis with pattern recognition algorithms is a powerful technology applied to several biological problems. Common pattern recognition systems classify samples assigning them to a set of known classes. However, in a clinical diagnostics setup, novel and unknown classes (new pathologies) may appear and one must be able to reject those samples that do not fit the trained model. The problem of implementing a rejection option in a multi-class classifier has not been widely addressed in the statistical literature. Gene expression profiles represent a critical case study since they suffer from the curse of dimensionality problem that negatively reflects on the reliability of both traditional rejection models and also more recent approaches such as one-class classifiers. RESULTS: This paper presents a set of empirical decision rules that can be used to implement a rejection option in a set of multi-class classifiers widely used for the analysis of gene expression profiles. In particular, we focus on the classifiers implemented in the R Language and Environment for Statistical Computing (R for short in the remaining of this paper). The main contribution of the proposed rules is their simplicity, which enables an easy integration with available data analysis environments. Since in the definition of a rejection model tuning of the involved parameters is often a complex and delicate task, in this paper we exploit an evolutionary strategy to automate this process. This allows the final user to maximize the rejection accuracy with minimum manual intervention. CONCLUSIONS: This paper shows how the use of simple decision rules can be used to help the use of complex machine learning algorithms in real experimental setups. The proposed approach is almost completely automated and therefore a good candidate for being integrated in data analysis flows in labs where the machine learning expertise required to tune traditional classifiers might not be available. |
format | Online Article Text |
id | pubmed-3278843 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-32788432012-02-14 Building gene expression profile classifiers with a simple and efficient rejection option in R Benso, Alfredo Di Carlo, Stefano Politano, Gianfranco Savino, Alessandro Hafeezurrehman, Hafeez BMC Bioinformatics Proceedings BACKGROUND: The collection of gene expression profiles from DNA microarrays and their analysis with pattern recognition algorithms is a powerful technology applied to several biological problems. Common pattern recognition systems classify samples assigning them to a set of known classes. However, in a clinical diagnostics setup, novel and unknown classes (new pathologies) may appear and one must be able to reject those samples that do not fit the trained model. The problem of implementing a rejection option in a multi-class classifier has not been widely addressed in the statistical literature. Gene expression profiles represent a critical case study since they suffer from the curse of dimensionality problem that negatively reflects on the reliability of both traditional rejection models and also more recent approaches such as one-class classifiers. RESULTS: This paper presents a set of empirical decision rules that can be used to implement a rejection option in a set of multi-class classifiers widely used for the analysis of gene expression profiles. In particular, we focus on the classifiers implemented in the R Language and Environment for Statistical Computing (R for short in the remaining of this paper). The main contribution of the proposed rules is their simplicity, which enables an easy integration with available data analysis environments. Since in the definition of a rejection model tuning of the involved parameters is often a complex and delicate task, in this paper we exploit an evolutionary strategy to automate this process. This allows the final user to maximize the rejection accuracy with minimum manual intervention. CONCLUSIONS: This paper shows how the use of simple decision rules can be used to help the use of complex machine learning algorithms in real experimental setups. The proposed approach is almost completely automated and therefore a good candidate for being integrated in data analysis flows in labs where the machine learning expertise required to tune traditional classifiers might not be available. BioMed Central 2011-11-30 /pmc/articles/PMC3278843/ /pubmed/22373214 http://dx.doi.org/10.1186/1471-2105-12-S13-S3 Text en Copyright ©2011 Benso et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Benso, Alfredo Di Carlo, Stefano Politano, Gianfranco Savino, Alessandro Hafeezurrehman, Hafeez Building gene expression profile classifiers with a simple and efficient rejection option in R |
title | Building gene expression profile classifiers with a simple and efficient rejection option in R |
title_full | Building gene expression profile classifiers with a simple and efficient rejection option in R |
title_fullStr | Building gene expression profile classifiers with a simple and efficient rejection option in R |
title_full_unstemmed | Building gene expression profile classifiers with a simple and efficient rejection option in R |
title_short | Building gene expression profile classifiers with a simple and efficient rejection option in R |
title_sort | building gene expression profile classifiers with a simple and efficient rejection option in r |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3278843/ https://www.ncbi.nlm.nih.gov/pubmed/22373214 http://dx.doi.org/10.1186/1471-2105-12-S13-S3 |
work_keys_str_mv | AT bensoalfredo buildinggeneexpressionprofileclassifierswithasimpleandefficientrejectionoptioninr AT dicarlostefano buildinggeneexpressionprofileclassifierswithasimpleandefficientrejectionoptioninr AT politanogianfranco buildinggeneexpressionprofileclassifierswithasimpleandefficientrejectionoptioninr AT savinoalessandro buildinggeneexpressionprofileclassifierswithasimpleandefficientrejectionoptioninr AT hafeezurrehmanhafeez buildinggeneexpressionprofileclassifierswithasimpleandefficientrejectionoptioninr |