Cargando…

Microarray missing data imputation based on a set theoretic framework and biological knowledge

Gene expressions measured using microarrays usually suffer from the missing value problem. However, in many data analysis methods, a complete data matrix is required. Although existing missing value imputation algorithms have shown good performance to deal with missing values, they also have their l...

Descripción completa

Detalles Bibliográficos
Autores principales: Gan, Xiangchao, Liew, Alan Wee-Chung, Yan, Hong
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1409680/
https://www.ncbi.nlm.nih.gov/pubmed/16549873
http://dx.doi.org/10.1093/nar/gkl047
_version_ 1782127048696791040
author Gan, Xiangchao
Liew, Alan Wee-Chung
Yan, Hong
author_facet Gan, Xiangchao
Liew, Alan Wee-Chung
Yan, Hong
author_sort Gan, Xiangchao
collection PubMed
description Gene expressions measured using microarrays usually suffer from the missing value problem. However, in many data analysis methods, a complete data matrix is required. Although existing missing value imputation algorithms have shown good performance to deal with missing values, they also have their limitations. For example, some algorithms have good performance only when strong local correlation exists in data while some provide the best estimate when data is dominated by global structure. In addition, these algorithms do not take into account any biological constraint in their imputation. In this paper, we propose a set theoretic framework based on projection onto convex sets (POCS) for missing data imputation. POCS allows us to incorporate different types of a priori knowledge about missing values into the estimation process. The main idea of POCS is to formulate every piece of prior knowledge into a corresponding convex set and then use a convergence-guaranteed iterative procedure to obtain a solution in the intersection of all these sets. In this work, we design several convex sets, taking into consideration the biological characteristic of the data: the first set mainly exploit the local correlation structure among genes in microarray data, while the second set captures the global correlation structure among arrays. The third set (actually a series of sets) exploits the biological phenomenon of synchronization loss in microarray experiments. In cyclic systems, synchronization loss is a common phenomenon and we construct a series of sets based on this phenomenon for our POCS imputation algorithm. Experiments show that our algorithm can achieve a significant reduction of error compared to the KNNimpute, SVDimpute and LSimpute methods.
format Text
id pubmed-1409680
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-14096802006-04-05 Microarray missing data imputation based on a set theoretic framework and biological knowledge Gan, Xiangchao Liew, Alan Wee-Chung Yan, Hong Nucleic Acids Res Article Gene expressions measured using microarrays usually suffer from the missing value problem. However, in many data analysis methods, a complete data matrix is required. Although existing missing value imputation algorithms have shown good performance to deal with missing values, they also have their limitations. For example, some algorithms have good performance only when strong local correlation exists in data while some provide the best estimate when data is dominated by global structure. In addition, these algorithms do not take into account any biological constraint in their imputation. In this paper, we propose a set theoretic framework based on projection onto convex sets (POCS) for missing data imputation. POCS allows us to incorporate different types of a priori knowledge about missing values into the estimation process. The main idea of POCS is to formulate every piece of prior knowledge into a corresponding convex set and then use a convergence-guaranteed iterative procedure to obtain a solution in the intersection of all these sets. In this work, we design several convex sets, taking into consideration the biological characteristic of the data: the first set mainly exploit the local correlation structure among genes in microarray data, while the second set captures the global correlation structure among arrays. The third set (actually a series of sets) exploits the biological phenomenon of synchronization loss in microarray experiments. In cyclic systems, synchronization loss is a common phenomenon and we construct a series of sets based on this phenomenon for our POCS imputation algorithm. Experiments show that our algorithm can achieve a significant reduction of error compared to the KNNimpute, SVDimpute and LSimpute methods. Oxford University Press 2006 2006-03-20 /pmc/articles/PMC1409680/ /pubmed/16549873 http://dx.doi.org/10.1093/nar/gkl047 Text en © The Author 2006. Published by Oxford University Press. All rights reserved
spellingShingle Article
Gan, Xiangchao
Liew, Alan Wee-Chung
Yan, Hong
Microarray missing data imputation based on a set theoretic framework and biological knowledge
title Microarray missing data imputation based on a set theoretic framework and biological knowledge
title_full Microarray missing data imputation based on a set theoretic framework and biological knowledge
title_fullStr Microarray missing data imputation based on a set theoretic framework and biological knowledge
title_full_unstemmed Microarray missing data imputation based on a set theoretic framework and biological knowledge
title_short Microarray missing data imputation based on a set theoretic framework and biological knowledge
title_sort microarray missing data imputation based on a set theoretic framework and biological knowledge
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1409680/
https://www.ncbi.nlm.nih.gov/pubmed/16549873
http://dx.doi.org/10.1093/nar/gkl047
work_keys_str_mv AT ganxiangchao microarraymissingdataimputationbasedonasettheoreticframeworkandbiologicalknowledge
AT liewalanweechung microarraymissingdataimputationbasedonasettheoreticframeworkandbiologicalknowledge
AT yanhong microarraymissingdataimputationbasedonasettheoreticframeworkandbiologicalknowledge