Cargando…
Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery
BACKGROUND: Although high-throughput microarray based molecular diagnostic technologies show a great promise in cancer diagnosis, it is still far from a clinical application due to its low and instable sensitivities and specificities in cancer molecular pattern recognition. In fact, high-dimensional...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3044315/ https://www.ncbi.nlm.nih.gov/pubmed/21342590 http://dx.doi.org/10.1186/1471-2105-12-S1-S7 |
_version_ | 1782198718850662400 |
---|---|
author | Han, Henry Li, Xiao-Li |
author_facet | Han, Henry Li, Xiao-Li |
author_sort | Han, Henry |
collection | PubMed |
description | BACKGROUND: Although high-throughput microarray based molecular diagnostic technologies show a great promise in cancer diagnosis, it is still far from a clinical application due to its low and instable sensitivities and specificities in cancer molecular pattern recognition. In fact, high-dimensional and heterogeneous tumor profiles challenge current machine learning methodologies for its small number of samples and large or even huge number of variables (genes). This naturally calls for the use of an effective feature selection in microarray data classification. METHODS: We propose a novel feature selection method: multi-resolution independent component analysis (MICA) for large-scale gene expression data. This method overcomes the weak points of the widely used transform-based feature selection methods such as principal component analysis (PCA), independent component analysis (ICA), and nonnegative matrix factorization (NMF) by avoiding their global feature-selection mechanism. In addition to demonstrating the effectiveness of the multi-resolution independent component analysis in meaningful biomarker discovery, we present a multi-resolution independent component analysis based support vector machines (MICA-SVM) and linear discriminant analysis (MICA-LDA) to attain high-performance classifications in low-dimensional spaces. RESULTS: We have demonstrated the superiority and stability of our algorithms by performing comprehensive experimental comparisons with nine state-of-the-art algorithms on six high-dimensional heterogeneous profiles under cross validations. Our classification algorithms, especially, MICA-SVM, not only accomplish clinical or near-clinical level sensitivities and specificities, but also show strong performance stability over its peers in classification. Software that implements the major algorithm and data sets on which this paper focuses are freely available at https://sites.google.com/site/heyaumapbc2011/. CONCLUSIONS: This work suggests a new direction to accelerate microarray technologies into a clinical routine through building a high-performance classifier to attain clinical-level sensitivities and specificities by treating an input profile as a ‘profile-biomarker’. The multi-resolution data analysis based redundant global feature suppressing and effective local feature extraction also have a positive impact on large scale ‘omics’ data mining. |
format | Text |
id | pubmed-3044315 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-30443152011-02-25 Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery Han, Henry Li, Xiao-Li BMC Bioinformatics Research BACKGROUND: Although high-throughput microarray based molecular diagnostic technologies show a great promise in cancer diagnosis, it is still far from a clinical application due to its low and instable sensitivities and specificities in cancer molecular pattern recognition. In fact, high-dimensional and heterogeneous tumor profiles challenge current machine learning methodologies for its small number of samples and large or even huge number of variables (genes). This naturally calls for the use of an effective feature selection in microarray data classification. METHODS: We propose a novel feature selection method: multi-resolution independent component analysis (MICA) for large-scale gene expression data. This method overcomes the weak points of the widely used transform-based feature selection methods such as principal component analysis (PCA), independent component analysis (ICA), and nonnegative matrix factorization (NMF) by avoiding their global feature-selection mechanism. In addition to demonstrating the effectiveness of the multi-resolution independent component analysis in meaningful biomarker discovery, we present a multi-resolution independent component analysis based support vector machines (MICA-SVM) and linear discriminant analysis (MICA-LDA) to attain high-performance classifications in low-dimensional spaces. RESULTS: We have demonstrated the superiority and stability of our algorithms by performing comprehensive experimental comparisons with nine state-of-the-art algorithms on six high-dimensional heterogeneous profiles under cross validations. Our classification algorithms, especially, MICA-SVM, not only accomplish clinical or near-clinical level sensitivities and specificities, but also show strong performance stability over its peers in classification. Software that implements the major algorithm and data sets on which this paper focuses are freely available at https://sites.google.com/site/heyaumapbc2011/. CONCLUSIONS: This work suggests a new direction to accelerate microarray technologies into a clinical routine through building a high-performance classifier to attain clinical-level sensitivities and specificities by treating an input profile as a ‘profile-biomarker’. The multi-resolution data analysis based redundant global feature suppressing and effective local feature extraction also have a positive impact on large scale ‘omics’ data mining. BioMed Central 2011-02-15 /pmc/articles/PMC3044315/ /pubmed/21342590 http://dx.doi.org/10.1186/1471-2105-12-S1-S7 Text en Copyright ©2011 Han and Li; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Han, Henry Li, Xiao-Li Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery |
title | Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery |
title_full | Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery |
title_fullStr | Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery |
title_full_unstemmed | Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery |
title_short | Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery |
title_sort | multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3044315/ https://www.ncbi.nlm.nih.gov/pubmed/21342590 http://dx.doi.org/10.1186/1471-2105-12-S1-S7 |
work_keys_str_mv | AT hanhenry multiresolutionindependentcomponentanalysisforhighperformancetumorclassificationandbiomarkerdiscovery AT lixiaoli multiresolutionindependentcomponentanalysisforhighperformancetumorclassificationandbiomarkerdiscovery |