Cargando…

A feature selection method for classification within functional genomics experiments based on the proportional overlapping score

BACKGROUND: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected d...

Descripción completa

Detalles Bibliográficos
Autores principales: Mahmoud, Osama, Harrison, Andrew, Perperoglou, Aris, Gul, Asma, Khan, Zardad, Metodiev, Metodi V, Lausen, Berthold
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4141116/
https://www.ncbi.nlm.nih.gov/pubmed/25113817
http://dx.doi.org/10.1186/1471-2105-15-274
_version_ 1782331593466052608
author Mahmoud, Osama
Harrison, Andrew
Perperoglou, Aris
Gul, Asma
Khan, Zardad
Metodiev, Metodi V
Lausen, Berthold
author_facet Mahmoud, Osama
Harrison, Andrew
Perperoglou, Aris
Gul, Asma
Khan, Zardad
Metodiev, Metodi V
Lausen, Berthold
author_sort Mahmoud, Osama
collection PubMed
description BACKGROUND: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature’s relevance to a classification task. RESULTS: We apply POS, along‐with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance. CONCLUSIONS: A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along‐with a novel gene score are exploited to produce the selected subset of genes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2105-15-274) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4141116
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41411162014-08-23 A feature selection method for classification within functional genomics experiments based on the proportional overlapping score Mahmoud, Osama Harrison, Andrew Perperoglou, Aris Gul, Asma Khan, Zardad Metodiev, Metodi V Lausen, Berthold BMC Bioinformatics Methodology Article BACKGROUND: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature’s relevance to a classification task. RESULTS: We apply POS, along‐with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance. CONCLUSIONS: A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along‐with a novel gene score are exploited to produce the selected subset of genes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2105-15-274) contains supplementary material, which is available to authorized users. BioMed Central 2014-08-11 /pmc/articles/PMC4141116/ /pubmed/25113817 http://dx.doi.org/10.1186/1471-2105-15-274 Text en © Mahmoud et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Mahmoud, Osama
Harrison, Andrew
Perperoglou, Aris
Gul, Asma
Khan, Zardad
Metodiev, Metodi V
Lausen, Berthold
A feature selection method for classification within functional genomics experiments based on the proportional overlapping score
title A feature selection method for classification within functional genomics experiments based on the proportional overlapping score
title_full A feature selection method for classification within functional genomics experiments based on the proportional overlapping score
title_fullStr A feature selection method for classification within functional genomics experiments based on the proportional overlapping score
title_full_unstemmed A feature selection method for classification within functional genomics experiments based on the proportional overlapping score
title_short A feature selection method for classification within functional genomics experiments based on the proportional overlapping score
title_sort feature selection method for classification within functional genomics experiments based on the proportional overlapping score
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4141116/
https://www.ncbi.nlm.nih.gov/pubmed/25113817
http://dx.doi.org/10.1186/1471-2105-15-274
work_keys_str_mv AT mahmoudosama afeatureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore
AT harrisonandrew afeatureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore
AT perperoglouaris afeatureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore
AT gulasma afeatureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore
AT khanzardad afeatureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore
AT metodievmetodiv afeatureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore
AT lausenberthold afeatureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore
AT mahmoudosama featureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore
AT harrisonandrew featureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore
AT perperoglouaris featureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore
AT gulasma featureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore
AT khanzardad featureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore
AT metodievmetodiv featureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore
AT lausenberthold featureselectionmethodforclassificationwithinfunctionalgenomicsexperimentsbasedontheproportionaloverlappingscore