Cargando…

Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery

BACKGROUND: Although high-throughput microarray based molecular diagnostic technologies show a great promise in cancer diagnosis, it is still far from a clinical application due to its low and instable sensitivities and specificities in cancer molecular pattern recognition. In fact, high-dimensional...

Descripción completa

Detalles Bibliográficos
Autores principales: Han, Henry, Li, Xiao-Li
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3044315/
https://www.ncbi.nlm.nih.gov/pubmed/21342590
http://dx.doi.org/10.1186/1471-2105-12-S1-S7
_version_ 1782198718850662400
author Han, Henry
Li, Xiao-Li
author_facet Han, Henry
Li, Xiao-Li
author_sort Han, Henry
collection PubMed
description BACKGROUND: Although high-throughput microarray based molecular diagnostic technologies show a great promise in cancer diagnosis, it is still far from a clinical application due to its low and instable sensitivities and specificities in cancer molecular pattern recognition. In fact, high-dimensional and heterogeneous tumor profiles challenge current machine learning methodologies for its small number of samples and large or even huge number of variables (genes). This naturally calls for the use of an effective feature selection in microarray data classification. METHODS: We propose a novel feature selection method: multi-resolution independent component analysis (MICA) for large-scale gene expression data. This method overcomes the weak points of the widely used transform-based feature selection methods such as principal component analysis (PCA), independent component analysis (ICA), and nonnegative matrix factorization (NMF) by avoiding their global feature-selection mechanism. In addition to demonstrating the effectiveness of the multi-resolution independent component analysis in meaningful biomarker discovery, we present a multi-resolution independent component analysis based support vector machines (MICA-SVM) and linear discriminant analysis (MICA-LDA) to attain high-performance classifications in low-dimensional spaces. RESULTS: We have demonstrated the superiority and stability of our algorithms by performing comprehensive experimental comparisons with nine state-of-the-art algorithms on six high-dimensional heterogeneous profiles under cross validations. Our classification algorithms, especially, MICA-SVM, not only accomplish clinical or near-clinical level sensitivities and specificities, but also show strong performance stability over its peers in classification. Software that implements the major algorithm and data sets on which this paper focuses are freely available at https://sites.google.com/site/heyaumapbc2011/. CONCLUSIONS: This work suggests a new direction to accelerate microarray technologies into a clinical routine through building a high-performance classifier to attain clinical-level sensitivities and specificities by treating an input profile as a ‘profile-biomarker’. The multi-resolution data analysis based redundant global feature suppressing and effective local feature extraction also have a positive impact on large scale ‘omics’ data mining.
format Text
id pubmed-3044315
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30443152011-02-25 Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery Han, Henry Li, Xiao-Li BMC Bioinformatics Research BACKGROUND: Although high-throughput microarray based molecular diagnostic technologies show a great promise in cancer diagnosis, it is still far from a clinical application due to its low and instable sensitivities and specificities in cancer molecular pattern recognition. In fact, high-dimensional and heterogeneous tumor profiles challenge current machine learning methodologies for its small number of samples and large or even huge number of variables (genes). This naturally calls for the use of an effective feature selection in microarray data classification. METHODS: We propose a novel feature selection method: multi-resolution independent component analysis (MICA) for large-scale gene expression data. This method overcomes the weak points of the widely used transform-based feature selection methods such as principal component analysis (PCA), independent component analysis (ICA), and nonnegative matrix factorization (NMF) by avoiding their global feature-selection mechanism. In addition to demonstrating the effectiveness of the multi-resolution independent component analysis in meaningful biomarker discovery, we present a multi-resolution independent component analysis based support vector machines (MICA-SVM) and linear discriminant analysis (MICA-LDA) to attain high-performance classifications in low-dimensional spaces. RESULTS: We have demonstrated the superiority and stability of our algorithms by performing comprehensive experimental comparisons with nine state-of-the-art algorithms on six high-dimensional heterogeneous profiles under cross validations. Our classification algorithms, especially, MICA-SVM, not only accomplish clinical or near-clinical level sensitivities and specificities, but also show strong performance stability over its peers in classification. Software that implements the major algorithm and data sets on which this paper focuses are freely available at https://sites.google.com/site/heyaumapbc2011/. CONCLUSIONS: This work suggests a new direction to accelerate microarray technologies into a clinical routine through building a high-performance classifier to attain clinical-level sensitivities and specificities by treating an input profile as a ‘profile-biomarker’. The multi-resolution data analysis based redundant global feature suppressing and effective local feature extraction also have a positive impact on large scale ‘omics’ data mining. BioMed Central 2011-02-15 /pmc/articles/PMC3044315/ /pubmed/21342590 http://dx.doi.org/10.1186/1471-2105-12-S1-S7 Text en Copyright ©2011 Han and Li; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Han, Henry
Li, Xiao-Li
Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery
title Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery
title_full Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery
title_fullStr Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery
title_full_unstemmed Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery
title_short Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery
title_sort multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3044315/
https://www.ncbi.nlm.nih.gov/pubmed/21342590
http://dx.doi.org/10.1186/1471-2105-12-S1-S7
work_keys_str_mv AT hanhenry multiresolutionindependentcomponentanalysisforhighperformancetumorclassificationandbiomarkerdiscovery
AT lixiaoli multiresolutionindependentcomponentanalysisforhighperformancetumorclassificationandbiomarkerdiscovery