Cargando…

Missing value imputation for microarray gene expression data using histone acetylation information

BACKGROUND: It is an important pre-processing step to accurately estimate missing values in microarray data, because complete datasets are required in numerous expression profile analysis in bioinformatics. Although several methods have been suggested, their performances are not satisfactory for dat...

Descripción completa

Detalles Bibliográficos
Autores principales: Xiang, Qian, Dai, Xianhua, Deng, Yangyang, He, Caisheng, Wang, Jiang, Feng, Jihua, Dai, Zhiming
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2432074/
https://www.ncbi.nlm.nih.gov/pubmed/18510747
http://dx.doi.org/10.1186/1471-2105-9-252
_version_ 1782156456531853312
author Xiang, Qian
Dai, Xianhua
Deng, Yangyang
He, Caisheng
Wang, Jiang
Feng, Jihua
Dai, Zhiming
author_facet Xiang, Qian
Dai, Xianhua
Deng, Yangyang
He, Caisheng
Wang, Jiang
Feng, Jihua
Dai, Zhiming
author_sort Xiang, Qian
collection PubMed
description BACKGROUND: It is an important pre-processing step to accurately estimate missing values in microarray data, because complete datasets are required in numerous expression profile analysis in bioinformatics. Although several methods have been suggested, their performances are not satisfactory for datasets with high missing percentages. RESULTS: The paper explores the feasibility of doing missing value imputation with the help of gene regulatory mechanism. An imputation framework called histone acetylation information aided imputation method (HAIimpute method) is presented. It incorporates the histone acetylation information into the conventional KNN(k-nearest neighbor) and LLS(local least square) imputation algorithms for final prediction of the missing values. The experimental results indicated that the use of acetylation information can provide significant improvements in microarray imputation accuracy. The HAIimpute methods consistently improve the widely used methods such as KNN and LLS in terms of normalized root mean squared error (NRMSE). Meanwhile, the genes imputed by HAIimpute methods are more correlated with the original complete genes in terms of Pearson correlation coefficients. Furthermore, the proposed methods also outperform GOimpute, which is one of the existing related methods that use the functional similarity as the external information. CONCLUSION: We demonstrated that the using of histone acetylation information could greatly improve the performance of the imputation especially at high missing percentages. This idea can be generalized to various imputation methods to facilitate the performance. Moreover, with more knowledge accumulated on gene regulatory mechanism in addition to histone acetylation, the performance of our approach can be further improved and verified.
format Text
id pubmed-2432074
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-24320742008-06-20 Missing value imputation for microarray gene expression data using histone acetylation information Xiang, Qian Dai, Xianhua Deng, Yangyang He, Caisheng Wang, Jiang Feng, Jihua Dai, Zhiming BMC Bioinformatics Research Article BACKGROUND: It is an important pre-processing step to accurately estimate missing values in microarray data, because complete datasets are required in numerous expression profile analysis in bioinformatics. Although several methods have been suggested, their performances are not satisfactory for datasets with high missing percentages. RESULTS: The paper explores the feasibility of doing missing value imputation with the help of gene regulatory mechanism. An imputation framework called histone acetylation information aided imputation method (HAIimpute method) is presented. It incorporates the histone acetylation information into the conventional KNN(k-nearest neighbor) and LLS(local least square) imputation algorithms for final prediction of the missing values. The experimental results indicated that the use of acetylation information can provide significant improvements in microarray imputation accuracy. The HAIimpute methods consistently improve the widely used methods such as KNN and LLS in terms of normalized root mean squared error (NRMSE). Meanwhile, the genes imputed by HAIimpute methods are more correlated with the original complete genes in terms of Pearson correlation coefficients. Furthermore, the proposed methods also outperform GOimpute, which is one of the existing related methods that use the functional similarity as the external information. CONCLUSION: We demonstrated that the using of histone acetylation information could greatly improve the performance of the imputation especially at high missing percentages. This idea can be generalized to various imputation methods to facilitate the performance. Moreover, with more knowledge accumulated on gene regulatory mechanism in addition to histone acetylation, the performance of our approach can be further improved and verified. BioMed Central 2008-05-29 /pmc/articles/PMC2432074/ /pubmed/18510747 http://dx.doi.org/10.1186/1471-2105-9-252 Text en Copyright © 2008 Xiang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Xiang, Qian
Dai, Xianhua
Deng, Yangyang
He, Caisheng
Wang, Jiang
Feng, Jihua
Dai, Zhiming
Missing value imputation for microarray gene expression data using histone acetylation information
title Missing value imputation for microarray gene expression data using histone acetylation information
title_full Missing value imputation for microarray gene expression data using histone acetylation information
title_fullStr Missing value imputation for microarray gene expression data using histone acetylation information
title_full_unstemmed Missing value imputation for microarray gene expression data using histone acetylation information
title_short Missing value imputation for microarray gene expression data using histone acetylation information
title_sort missing value imputation for microarray gene expression data using histone acetylation information
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2432074/
https://www.ncbi.nlm.nih.gov/pubmed/18510747
http://dx.doi.org/10.1186/1471-2105-9-252
work_keys_str_mv AT xiangqian missingvalueimputationformicroarraygeneexpressiondatausinghistoneacetylationinformation
AT daixianhua missingvalueimputationformicroarraygeneexpressiondatausinghistoneacetylationinformation
AT dengyangyang missingvalueimputationformicroarraygeneexpressiondatausinghistoneacetylationinformation
AT hecaisheng missingvalueimputationformicroarraygeneexpressiondatausinghistoneacetylationinformation
AT wangjiang missingvalueimputationformicroarraygeneexpressiondatausinghistoneacetylationinformation
AT fengjihua missingvalueimputationformicroarraygeneexpressiondatausinghistoneacetylationinformation
AT daizhiming missingvalueimputationformicroarraygeneexpressiondatausinghistoneacetylationinformation