Cargando…

A hybrid imputation approach for microarray missing value estimation

BACKGROUND: Missing data is an inevitable phenomenon in gene expression microarray experiments due to instrument failure or human error. It has a negative impact on performance of downstream analysis. Technically, most existing approaches suffer from this prevalent problem. Imputation is one of the...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Huihui, Zhao, Changbo, Shao, Fengfeng, Li, Guo-Zheng, Wang, Xiao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4547405/
https://www.ncbi.nlm.nih.gov/pubmed/26330180
http://dx.doi.org/10.1186/1471-2164-16-S9-S1
_version_ 1782387066442612736
author Li, Huihui
Zhao, Changbo
Shao, Fengfeng
Li, Guo-Zheng
Wang, Xiao
author_facet Li, Huihui
Zhao, Changbo
Shao, Fengfeng
Li, Guo-Zheng
Wang, Xiao
author_sort Li, Huihui
collection PubMed
description BACKGROUND: Missing data is an inevitable phenomenon in gene expression microarray experiments due to instrument failure or human error. It has a negative impact on performance of downstream analysis. Technically, most existing approaches suffer from this prevalent problem. Imputation is one of the frequently used methods for processing missing data. Actually many developments have been achieved in the research on estimating missing values. The challenging task is how to improve imputation accuracy for data with a large missing rate. METHODS: In this paper, induced by the thought of collaborative training, we propose a novel hybrid imputation method, called Recursive Mutual Imputation (RMI). Specifically, RMI exploits global correlation information and local structure in the data, captured by two popular methods, Bayesian Principal Component Analysis (BPCA) and Local Least Squares (LLS), respectively. Mutual strategy is implemented by sharing the estimated data sequences at each recursive process. Meanwhile, we consider the imputation sequence based on the number of missing entries in the target gene. Furthermore, a weight based integrated method is utilized in the final assembling step. RESULTS: We evaluate RMI with three state-of-art algorithms (BPCA, LLS, Iterated Local Least Squares imputation (ItrLLS)) on four publicly available microarray datasets. Experimental results clearly demonstrate that RMI significantly outperforms comparative methods in terms of Normalized Root Mean Square Error (NRMSE), especially for datasets with large missing rates and less complete genes. CONCLUSIONS: It is noted that our proposed hybrid imputation approach incorporates both global and local information of microarray genes, which achieves lower NRMSE values against to any single approach only. Besides, this study highlights the need for considering the imputing sequence of missing entries for imputation methods.
format Online
Article
Text
id pubmed-4547405
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45474052015-09-10 A hybrid imputation approach for microarray missing value estimation Li, Huihui Zhao, Changbo Shao, Fengfeng Li, Guo-Zheng Wang, Xiao BMC Genomics Research BACKGROUND: Missing data is an inevitable phenomenon in gene expression microarray experiments due to instrument failure or human error. It has a negative impact on performance of downstream analysis. Technically, most existing approaches suffer from this prevalent problem. Imputation is one of the frequently used methods for processing missing data. Actually many developments have been achieved in the research on estimating missing values. The challenging task is how to improve imputation accuracy for data with a large missing rate. METHODS: In this paper, induced by the thought of collaborative training, we propose a novel hybrid imputation method, called Recursive Mutual Imputation (RMI). Specifically, RMI exploits global correlation information and local structure in the data, captured by two popular methods, Bayesian Principal Component Analysis (BPCA) and Local Least Squares (LLS), respectively. Mutual strategy is implemented by sharing the estimated data sequences at each recursive process. Meanwhile, we consider the imputation sequence based on the number of missing entries in the target gene. Furthermore, a weight based integrated method is utilized in the final assembling step. RESULTS: We evaluate RMI with three state-of-art algorithms (BPCA, LLS, Iterated Local Least Squares imputation (ItrLLS)) on four publicly available microarray datasets. Experimental results clearly demonstrate that RMI significantly outperforms comparative methods in terms of Normalized Root Mean Square Error (NRMSE), especially for datasets with large missing rates and less complete genes. CONCLUSIONS: It is noted that our proposed hybrid imputation approach incorporates both global and local information of microarray genes, which achieves lower NRMSE values against to any single approach only. Besides, this study highlights the need for considering the imputing sequence of missing entries for imputation methods. BioMed Central 2015-08-17 /pmc/articles/PMC4547405/ /pubmed/26330180 http://dx.doi.org/10.1186/1471-2164-16-S9-S1 Text en Copyright © 2015 Li et al. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Li, Huihui
Zhao, Changbo
Shao, Fengfeng
Li, Guo-Zheng
Wang, Xiao
A hybrid imputation approach for microarray missing value estimation
title A hybrid imputation approach for microarray missing value estimation
title_full A hybrid imputation approach for microarray missing value estimation
title_fullStr A hybrid imputation approach for microarray missing value estimation
title_full_unstemmed A hybrid imputation approach for microarray missing value estimation
title_short A hybrid imputation approach for microarray missing value estimation
title_sort hybrid imputation approach for microarray missing value estimation
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4547405/
https://www.ncbi.nlm.nih.gov/pubmed/26330180
http://dx.doi.org/10.1186/1471-2164-16-S9-S1
work_keys_str_mv AT lihuihui ahybridimputationapproachformicroarraymissingvalueestimation
AT zhaochangbo ahybridimputationapproachformicroarraymissingvalueestimation
AT shaofengfeng ahybridimputationapproachformicroarraymissingvalueestimation
AT liguozheng ahybridimputationapproachformicroarraymissingvalueestimation
AT wangxiao ahybridimputationapproachformicroarraymissingvalueestimation
AT lihuihui hybridimputationapproachformicroarraymissingvalueestimation
AT zhaochangbo hybridimputationapproachformicroarraymissingvalueestimation
AT shaofengfeng hybridimputationapproachformicroarraymissingvalueestimation
AT liguozheng hybridimputationapproachformicroarraymissingvalueestimation
AT wangxiao hybridimputationapproachformicroarraymissingvalueestimation