Cargando…
A feature selection method for classification within functional genomics experiments based on the proportional overlapping score
BACKGROUND: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected d...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4141116/ https://www.ncbi.nlm.nih.gov/pubmed/25113817 http://dx.doi.org/10.1186/1471-2105-15-274 |
_version_ | 1782331593466052608 |
---|---|
author | Mahmoud, Osama Harrison, Andrew Perperoglou, Aris Gul, Asma Khan, Zardad Metodiev, Metodi V Lausen, Berthold |
author_facet | Mahmoud, Osama Harrison, Andrew Perperoglou, Aris Gul, Asma Khan, Zardad Metodiev, Metodi V Lausen, Berthold |
author_sort | Mahmoud, Osama |
collection | PubMed |
description | BACKGROUND: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature’s relevance to a classification task. RESULTS: We apply POS, along‐with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance. CONCLUSIONS: A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along‐with a novel gene score are exploited to produce the selected subset of genes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2105-15-274) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4141116 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-41411162014-08-23 A feature selection method for classification within functional genomics experiments based on the proportional overlapping score Mahmoud, Osama Harrison, Andrew Perperoglou, Aris Gul, Asma Khan, Zardad Metodiev, Metodi V Lausen, Berthold BMC Bioinformatics Methodology Article BACKGROUND: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature’s relevance to a classification task. RESULTS: We apply POS, along‐with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance. CONCLUSIONS: A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along‐with a novel gene score are exploited to produce the selected subset of genes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2105-15-274) contains supplementary material, which is available to authorized users. BioMed Central 2014-08-11 /pmc/articles/PMC4141116/ /pubmed/25113817 http://dx.doi.org/10.1186/1471-2105-15-274 Text en © Mahmoud et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Mahmoud, Osama Harrison, Andrew Perperoglou, Aris Gul, Asma Khan, Zardad Metodiev, Metodi V Lausen, Berthold A feature selection method for classification within functional genomics experiments based on the proportional overlapping score |
title | A feature selection method for classification within functional genomics experiments based on the proportional overlapping score |
title_full | A feature selection method for classification within functional genomics experiments based on the proportional overlapping score |
title_fullStr | A feature selection method for classification within functional genomics experiments based on the proportional overlapping score |
title_full_unstemmed | A feature selection method for classification within functional genomics experiments based on the proportional overlapping score |
title_short | A feature selection method for classification within functional genomics experiments based on the proportional overlapping score |
title_sort | feature selection method for classification within functional genomics experiments based on the proportional overlapping score |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4141116/ https://www.ncbi.nlm.nih.gov/pubmed/25113817 http://dx.doi.org/10.1186/1471-2105-15-274 |
work_keys_str_mv | AT mahmoudosama afeatureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore AT harrisonandrew afeatureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore AT perperoglouaris afeatureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore AT gulasma afeatureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore AT khanzardad afeatureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore AT metodievmetodiv afeatureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore AT lausenberthold afeatureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore AT mahmoudosama featureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore AT harrisonandrew featureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore AT perperoglouaris featureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore AT gulasma featureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore AT khanzardad featureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore AT metodievmetodiv featureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore AT lausenberthold featureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore |