Cargando…

Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery

BACKGROUND: Although high-throughput microarray based molecular diagnostic technologies show a great promise in cancer diagnosis, it is still far from a clinical application due to its low and instable sensitivities and specificities in cancer molecular pattern recognition. In fact, high-dimensional...

Descripción completa

Detalles Bibliográficos
Autores principales:	Han, Henry, Li, Xiao-Li
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3044315/ https://www.ncbi.nlm.nih.gov/pubmed/21342590 http://dx.doi.org/10.1186/1471-2105-12-S1-S7

_version_	1782198718850662400
author	Han, Henry Li, Xiao-Li
author_facet	Han, Henry Li, Xiao-Li
author_sort	Han, Henry
collection	PubMed
description	BACKGROUND: Although high-throughput microarray based molecular diagnostic technologies show a great promise in cancer diagnosis, it is still far from a clinical application due to its low and instable sensitivities and specificities in cancer molecular pattern recognition. In fact, high-dimensional and heterogeneous tumor profiles challenge current machine learning methodologies for its small number of samples and large or even huge number of variables (genes). This naturally calls for the use of an effective feature selection in microarray data classification. METHODS: We propose a novel feature selection method: multi-resolution independent component analysis (MICA) for large-scale gene expression data. This method overcomes the weak points of the widely used transform-based feature selection methods such as principal component analysis (PCA), independent component analysis (ICA), and nonnegative matrix factorization (NMF) by avoiding their global feature-selection mechanism. In addition to demonstrating the effectiveness of the multi-resolution independent component analysis in meaningful biomarker discovery, we present a multi-resolution independent component analysis based support vector machines (MICA-SVM) and linear discriminant analysis (MICA-LDA) to attain high-performance classifications in low-dimensional spaces. RESULTS: We have demonstrated the superiority and stability of our algorithms by performing comprehensive experimental comparisons with nine state-of-the-art algorithms on six high-dimensional heterogeneous profiles under cross validations. Our classification algorithms, especially, MICA-SVM, not only accomplish clinical or near-clinical level sensitivities and specificities, but also show strong performance stability over its peers in classification. Software that implements the major algorithm and data sets on which this paper focuses are freely available at https://sites.google.com/site/heyaumapbc2011/. CONCLUSIONS: This work suggests a new direction to accelerate microarray technologies into a clinical routine through building a high-performance classifier to attain clinical-level sensitivities and specificities by treating an input profile as a ‘profile-biomarker’. The multi-resolution data analysis based redundant global feature suppressing and effective local feature extraction also have a positive impact on large scale ‘omics’ data mining.
format	Text
id	pubmed-3044315
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-30443152011-02-25 Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery Han, Henry Li, Xiao-Li BMC Bioinformatics Research BACKGROUND: Although high-throughput microarray based molecular diagnostic technologies show a great promise in cancer diagnosis, it is still far from a clinical application due to its low and instable sensitivities and specificities in cancer molecular pattern recognition. In fact, high-dimensional and heterogeneous tumor profiles challenge current machine learning methodologies for its small number of samples and large or even huge number of variables (genes). This naturally calls for the use of an effective feature selection in microarray data classification. METHODS: We propose a novel feature selection method: multi-resolution independent component analysis (MICA) for large-scale gene expression data. This method overcomes the weak points of the widely used transform-based feature selection methods such as principal component analysis (PCA), independent component analysis (ICA), and nonnegative matrix factorization (NMF) by avoiding their global feature-selection mechanism. In addition to demonstrating the effectiveness of the multi-resolution independent component analysis in meaningful biomarker discovery, we present a multi-resolution independent component analysis based support vector machines (MICA-SVM) and linear discriminant analysis (MICA-LDA) to attain high-performance classifications in low-dimensional spaces. RESULTS: We have demonstrated the superiority and stability of our algorithms by performing comprehensive experimental comparisons with nine state-of-the-art algorithms on six high-dimensional heterogeneous profiles under cross validations. Our classification algorithms, especially, MICA-SVM, not only accomplish clinical or near-clinical level sensitivities and specificities, but also show strong performance stability over its peers in classification. Software that implements the major algorithm and data sets on which this paper focuses are freely available at https://sites.google.com/site/heyaumapbc2011/. CONCLUSIONS: This work suggests a new direction to accelerate microarray technologies into a clinical routine through building a high-performance classifier to attain clinical-level sensitivities and specificities by treating an input profile as a ‘profile-biomarker’. The multi-resolution data analysis based redundant global feature suppressing and effective local feature extraction also have a positive impact on large scale ‘omics’ data mining. BioMed Central 2011-02-15 /pmc/articles/PMC3044315/ /pubmed/21342590 http://dx.doi.org/10.1186/1471-2105-12-S1-S7 Text en Copyright ©2011 Han and Li; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Han, Henry Li, Xiao-Li Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery
title	Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery
title_full	Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery
title_fullStr	Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery
title_full_unstemmed	Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery
title_short	Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery
title_sort	multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3044315/ https://www.ncbi.nlm.nih.gov/pubmed/21342590 http://dx.doi.org/10.1186/1471-2105-12-S1-S7
work_keys_str_mv	AT hanhenry multiresolutionindependentcomponentanalysisforhighperformancetumorclassificationandbiomarkerdiscovery AT lixiaoli multiresolutionindependentcomponentanalysisforhighperformancetumorclassificationandbiomarkerdiscovery

Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery

Ejemplares similares