Cargando…

A Novel Hybrid Dimension Reduction Technique for Undersized High Dimensional Gene Expression Data Sets Using Information Complexity Criterion for Cancer Classification

Gene expression data typically are large, complex, and highly noisy. Their dimension is high with several thousand genes (i.e., features) but with only a limited number of observations (i.e., samples). Although the classical principal component analysis (PCA) method is widely used as a first standar...

Descripción completa

Detalles Bibliográficos
Autores principales: Pamukçu, Esra, Bozdogan, Hamparsum, Çalık, Sinan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4370236/
https://www.ncbi.nlm.nih.gov/pubmed/25838836
http://dx.doi.org/10.1155/2015/370640
_version_ 1782362848994787328
author Pamukçu, Esra
Bozdogan, Hamparsum
Çalık, Sinan
author_facet Pamukçu, Esra
Bozdogan, Hamparsum
Çalık, Sinan
author_sort Pamukçu, Esra
collection PubMed
description Gene expression data typically are large, complex, and highly noisy. Their dimension is high with several thousand genes (i.e., features) but with only a limited number of observations (i.e., samples). Although the classical principal component analysis (PCA) method is widely used as a first standard step in dimension reduction and in supervised and unsupervised classification, it suffers from several shortcomings in the case of data sets involving undersized samples, since the sample covariance matrix degenerates and becomes singular. In this paper we address these limitations within the context of probabilistic PCA (PPCA) by introducing and developing a new and novel approach using maximum entropy covariance matrix and its hybridized smoothed covariance estimators. To reduce the dimensionality of the data and to choose the number of probabilistic PCs (PPCs) to be retained, we further introduce and develop celebrated Akaike's information criterion (AIC), consistent Akaike's information criterion (CAIC), and the information theoretic measure of complexity (ICOMP) criterion of Bozdogan. Six publicly available undersized benchmark data sets were analyzed to show the utility, flexibility, and versatility of our approach with hybridized smoothed covariance matrix estimators, which do not degenerate to perform the PPCA to reduce the dimension and to carry out supervised classification of cancer groups in high dimensions.
format Online
Article
Text
id pubmed-4370236
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-43702362015-04-02 A Novel Hybrid Dimension Reduction Technique for Undersized High Dimensional Gene Expression Data Sets Using Information Complexity Criterion for Cancer Classification Pamukçu, Esra Bozdogan, Hamparsum Çalık, Sinan Comput Math Methods Med Research Article Gene expression data typically are large, complex, and highly noisy. Their dimension is high with several thousand genes (i.e., features) but with only a limited number of observations (i.e., samples). Although the classical principal component analysis (PCA) method is widely used as a first standard step in dimension reduction and in supervised and unsupervised classification, it suffers from several shortcomings in the case of data sets involving undersized samples, since the sample covariance matrix degenerates and becomes singular. In this paper we address these limitations within the context of probabilistic PCA (PPCA) by introducing and developing a new and novel approach using maximum entropy covariance matrix and its hybridized smoothed covariance estimators. To reduce the dimensionality of the data and to choose the number of probabilistic PCs (PPCs) to be retained, we further introduce and develop celebrated Akaike's information criterion (AIC), consistent Akaike's information criterion (CAIC), and the information theoretic measure of complexity (ICOMP) criterion of Bozdogan. Six publicly available undersized benchmark data sets were analyzed to show the utility, flexibility, and versatility of our approach with hybridized smoothed covariance matrix estimators, which do not degenerate to perform the PPCA to reduce the dimension and to carry out supervised classification of cancer groups in high dimensions. Hindawi Publishing Corporation 2015 2015-02-19 /pmc/articles/PMC4370236/ /pubmed/25838836 http://dx.doi.org/10.1155/2015/370640 Text en Copyright © 2015 Esra Pamukçu et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Pamukçu, Esra
Bozdogan, Hamparsum
Çalık, Sinan
A Novel Hybrid Dimension Reduction Technique for Undersized High Dimensional Gene Expression Data Sets Using Information Complexity Criterion for Cancer Classification
title A Novel Hybrid Dimension Reduction Technique for Undersized High Dimensional Gene Expression Data Sets Using Information Complexity Criterion for Cancer Classification
title_full A Novel Hybrid Dimension Reduction Technique for Undersized High Dimensional Gene Expression Data Sets Using Information Complexity Criterion for Cancer Classification
title_fullStr A Novel Hybrid Dimension Reduction Technique for Undersized High Dimensional Gene Expression Data Sets Using Information Complexity Criterion for Cancer Classification
title_full_unstemmed A Novel Hybrid Dimension Reduction Technique for Undersized High Dimensional Gene Expression Data Sets Using Information Complexity Criterion for Cancer Classification
title_short A Novel Hybrid Dimension Reduction Technique for Undersized High Dimensional Gene Expression Data Sets Using Information Complexity Criterion for Cancer Classification
title_sort novel hybrid dimension reduction technique for undersized high dimensional gene expression data sets using information complexity criterion for cancer classification
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4370236/
https://www.ncbi.nlm.nih.gov/pubmed/25838836
http://dx.doi.org/10.1155/2015/370640
work_keys_str_mv AT pamukcuesra anovelhybriddimensionreductiontechniqueforundersizedhighdimensionalgeneexpressiondatasetsusinginformationcomplexitycriterionforcancerclassification
AT bozdoganhamparsum anovelhybriddimensionreductiontechniqueforundersizedhighdimensionalgeneexpressiondatasetsusinginformationcomplexitycriterionforcancerclassification
AT calıksinan anovelhybriddimensionreductiontechniqueforundersizedhighdimensionalgeneexpressiondatasetsusinginformationcomplexitycriterionforcancerclassification
AT pamukcuesra novelhybriddimensionreductiontechniqueforundersizedhighdimensionalgeneexpressiondatasetsusinginformationcomplexitycriterionforcancerclassification
AT bozdoganhamparsum novelhybriddimensionreductiontechniqueforundersizedhighdimensionalgeneexpressiondatasetsusinginformationcomplexitycriterionforcancerclassification
AT calıksinan novelhybriddimensionreductiontechniqueforundersizedhighdimensionalgeneexpressiondatasetsusinginformationcomplexitycriterionforcancerclassification