Cargando…

Reuse of imputed data in microarray analysis increases imputation efficiency

BACKGROUND: The imputation of missing values is necessary for the efficient use of DNA microarray data, because many clustering algorithms and some statistical analysis require a complete data set. A few imputation methods for DNA microarray data have been introduced, but the efficiency of the metho...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Ki-Yeol, Kim, Byoung-Jin, Yi, Gwan-Su
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC528735/
https://www.ncbi.nlm.nih.gov/pubmed/15504240
http://dx.doi.org/10.1186/1471-2105-5-160
_version_ 1782121952856506368
author Kim, Ki-Yeol
Kim, Byoung-Jin
Yi, Gwan-Su
author_facet Kim, Ki-Yeol
Kim, Byoung-Jin
Yi, Gwan-Su
author_sort Kim, Ki-Yeol
collection PubMed
description BACKGROUND: The imputation of missing values is necessary for the efficient use of DNA microarray data, because many clustering algorithms and some statistical analysis require a complete data set. A few imputation methods for DNA microarray data have been introduced, but the efficiency of the methods was low and the validity of imputed values in these methods had not been fully checked. RESULTS: We developed a new cluster-based imputation method called sequential K-nearest neighbor (SKNN) method. This imputes the missing values sequentially from the gene having least missing values, and uses the imputed values for the later imputation. Although it uses the imputed values, the efficiency of this new method is greatly improved in its accuracy and computational complexity over the conventional KNN-based method and other methods based on maximum likelihood estimation. The performance of SKNN was in particular higher than other imputation methods for the data with high missing rates and large number of experiments. Application of Expectation Maximization (EM) to the SKNN method improved the accuracy, but increased computational time proportional to the number of iterations. The Multiple Imputation (MI) method, which is well known but not applied previously to microarray data, showed a similarly high accuracy as the SKNN method, with slightly higher dependency on the types of data sets. CONCLUSIONS: Sequential reuse of imputed data in KNN-based imputation greatly increases the efficiency of imputation. The SKNN method should be practically useful to save the data of some microarray experiments which have high amounts of missing entries. The SKNN method generates reliable imputed values which can be used for further cluster-based analysis of microarray data.
format Text
id pubmed-528735
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-5287352004-11-17 Reuse of imputed data in microarray analysis increases imputation efficiency Kim, Ki-Yeol Kim, Byoung-Jin Yi, Gwan-Su BMC Bioinformatics Methodology Article BACKGROUND: The imputation of missing values is necessary for the efficient use of DNA microarray data, because many clustering algorithms and some statistical analysis require a complete data set. A few imputation methods for DNA microarray data have been introduced, but the efficiency of the methods was low and the validity of imputed values in these methods had not been fully checked. RESULTS: We developed a new cluster-based imputation method called sequential K-nearest neighbor (SKNN) method. This imputes the missing values sequentially from the gene having least missing values, and uses the imputed values for the later imputation. Although it uses the imputed values, the efficiency of this new method is greatly improved in its accuracy and computational complexity over the conventional KNN-based method and other methods based on maximum likelihood estimation. The performance of SKNN was in particular higher than other imputation methods for the data with high missing rates and large number of experiments. Application of Expectation Maximization (EM) to the SKNN method improved the accuracy, but increased computational time proportional to the number of iterations. The Multiple Imputation (MI) method, which is well known but not applied previously to microarray data, showed a similarly high accuracy as the SKNN method, with slightly higher dependency on the types of data sets. CONCLUSIONS: Sequential reuse of imputed data in KNN-based imputation greatly increases the efficiency of imputation. The SKNN method should be practically useful to save the data of some microarray experiments which have high amounts of missing entries. The SKNN method generates reliable imputed values which can be used for further cluster-based analysis of microarray data. BioMed Central 2004-10-26 /pmc/articles/PMC528735/ /pubmed/15504240 http://dx.doi.org/10.1186/1471-2105-5-160 Text en Copyright © 2004 Kim et al; licensee BioMed Central Ltd.
spellingShingle Methodology Article
Kim, Ki-Yeol
Kim, Byoung-Jin
Yi, Gwan-Su
Reuse of imputed data in microarray analysis increases imputation efficiency
title Reuse of imputed data in microarray analysis increases imputation efficiency
title_full Reuse of imputed data in microarray analysis increases imputation efficiency
title_fullStr Reuse of imputed data in microarray analysis increases imputation efficiency
title_full_unstemmed Reuse of imputed data in microarray analysis increases imputation efficiency
title_short Reuse of imputed data in microarray analysis increases imputation efficiency
title_sort reuse of imputed data in microarray analysis increases imputation efficiency
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC528735/
https://www.ncbi.nlm.nih.gov/pubmed/15504240
http://dx.doi.org/10.1186/1471-2105-5-160
work_keys_str_mv AT kimkiyeol reuseofimputeddatainmicroarrayanalysisincreasesimputationefficiency
AT kimbyoungjin reuseofimputeddatainmicroarrayanalysisincreasesimputationefficiency
AT yigwansu reuseofimputeddatainmicroarrayanalysisincreasesimputationefficiency