Cargando…

Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data

BACKGROUND: A feature selection method in microarray gene expression data should be independent of platform, disease and dataset size. Our hypothesis is that among the statistically significant ranked genes in a gene list, there should be clusters of genes that share similar biological functions rel...

Descripción completa

Detalles Bibliográficos
Autores principales: Sakellariou, Argiris, Sanoudou, Despina, Spyrou, George
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3542193/
https://www.ncbi.nlm.nih.gov/pubmed/23075381
http://dx.doi.org/10.1186/1471-2105-13-270
_version_ 1782255471598501888
author Sakellariou, Argiris
Sanoudou, Despina
Spyrou, George
author_facet Sakellariou, Argiris
Sanoudou, Despina
Spyrou, George
author_sort Sakellariou, Argiris
collection PubMed
description BACKGROUND: A feature selection method in microarray gene expression data should be independent of platform, disease and dataset size. Our hypothesis is that among the statistically significant ranked genes in a gene list, there should be clusters of genes that share similar biological functions related to the investigated disease. Thus, instead of keeping N top ranked genes, it would be more appropriate to define and keep a number of gene cluster exemplars. RESULTS: We propose a hybrid FS method (mAP-KL), which combines multiple hypothesis testing and affinity propagation (AP)-clustering algorithm along with the Krzanowski & Lai cluster quality index, to select a small yet informative subset of genes. We applied mAP-KL on real microarray data, as well as on simulated data, and compared its performance against 13 other feature selection approaches. Across a variety of diseases and number of samples, mAP-KL presents competitive classification results, particularly in neuromuscular diseases, where its overall AUC score was 0.91. Furthermore, mAP-KL generates concise yet biologically relevant and informative N-gene expression signatures, which can serve as a valuable tool for diagnostic and prognostic purposes, as well as a source of potential disease biomarkers in a broad range of diseases. CONCLUSIONS: mAP-KL is a data-driven and classifier-independent hybrid feature selection method, which applies to any disease classification problem based on microarray data, regardless of the available samples. Combining multiple hypothesis testing and AP leads to subsets of genes, which classify unknown samples from both, small and large patient cohorts with high accuracy.
format Online
Article
Text
id pubmed-3542193
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35421932013-01-11 Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data Sakellariou, Argiris Sanoudou, Despina Spyrou, George BMC Bioinformatics Methodology Article BACKGROUND: A feature selection method in microarray gene expression data should be independent of platform, disease and dataset size. Our hypothesis is that among the statistically significant ranked genes in a gene list, there should be clusters of genes that share similar biological functions related to the investigated disease. Thus, instead of keeping N top ranked genes, it would be more appropriate to define and keep a number of gene cluster exemplars. RESULTS: We propose a hybrid FS method (mAP-KL), which combines multiple hypothesis testing and affinity propagation (AP)-clustering algorithm along with the Krzanowski & Lai cluster quality index, to select a small yet informative subset of genes. We applied mAP-KL on real microarray data, as well as on simulated data, and compared its performance against 13 other feature selection approaches. Across a variety of diseases and number of samples, mAP-KL presents competitive classification results, particularly in neuromuscular diseases, where its overall AUC score was 0.91. Furthermore, mAP-KL generates concise yet biologically relevant and informative N-gene expression signatures, which can serve as a valuable tool for diagnostic and prognostic purposes, as well as a source of potential disease biomarkers in a broad range of diseases. CONCLUSIONS: mAP-KL is a data-driven and classifier-independent hybrid feature selection method, which applies to any disease classification problem based on microarray data, regardless of the available samples. Combining multiple hypothesis testing and AP leads to subsets of genes, which classify unknown samples from both, small and large patient cohorts with high accuracy. BioMed Central 2012-10-17 /pmc/articles/PMC3542193/ /pubmed/23075381 http://dx.doi.org/10.1186/1471-2105-13-270 Text en Copyright ©2012 Sakellariou et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Sakellariou, Argiris
Sanoudou, Despina
Spyrou, George
Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data
title Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data
title_full Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data
title_fullStr Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data
title_full_unstemmed Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data
title_short Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data
title_sort combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3542193/
https://www.ncbi.nlm.nih.gov/pubmed/23075381
http://dx.doi.org/10.1186/1471-2105-13-270
work_keys_str_mv AT sakellariouargiris combiningmultiplehypothesistestingandaffinitypropagationclusteringleadstoaccuraterobustandsamplesizeindependentclassificationongeneexpressiondata
AT sanoudoudespina combiningmultiplehypothesistestingandaffinitypropagationclusteringleadstoaccuraterobustandsamplesizeindependentclassificationongeneexpressiondata
AT spyrougeorge combiningmultiplehypothesistestingandaffinitypropagationclusteringleadstoaccuraterobustandsamplesizeindependentclassificationongeneexpressiondata