Cargando…

Missing value imputation for microarray data: a comprehensive comparison study and a web tool

BACKGROUND: Microarray data are usually peppered with missing values due to various reasons. However, most of the downstream analyses for microarray data require complete datasets. Therefore, accurate algorithms for missing value estimation are needed for improving the performance of microarray data...

Descripción completa

Detalles Bibliográficos
Autores principales: Chiu, Chia-Chun, Chan, Shih-Yao, Wang, Chung-Ching, Wu, Wei-Sheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4028811/
https://www.ncbi.nlm.nih.gov/pubmed/24565220
http://dx.doi.org/10.1186/1752-0509-7-S6-S12
_version_ 1782317110383345664
author Chiu, Chia-Chun
Chan, Shih-Yao
Wang, Chung-Ching
Wu, Wei-Sheng
author_facet Chiu, Chia-Chun
Chan, Shih-Yao
Wang, Chung-Ching
Wu, Wei-Sheng
author_sort Chiu, Chia-Chun
collection PubMed
description BACKGROUND: Microarray data are usually peppered with missing values due to various reasons. However, most of the downstream analyses for microarray data require complete datasets. Therefore, accurate algorithms for missing value estimation are needed for improving the performance of microarray data analyses. Although many algorithms have been developed, there are many debates on the selection of the optimal algorithm. The studies about the performance comparison of different algorithms are still incomprehensive, especially in the number of benchmark datasets used, the number of algorithms compared, the rounds of simulation conducted, and the performance measures used. RESULTS: In this paper, we performed a comprehensive comparison by using (I) thirteen datasets, (II) nine algorithms, (III) 110 independent runs of simulation, and (IV) three types of measures to evaluate the performance of each imputation algorithm fairly. First, the effects of different types of microarray datasets on the performance of each imputation algorithm were evaluated. Second, we discussed whether the datasets from different species have different impact on the performance of different algorithms. To assess the performance of each algorithm fairly, all evaluations were performed using three types of measures. Our results indicate that the performance of an imputation algorithm mainly depends on the type of a dataset but not on the species where the samples come from. In addition to the statistical measure, two other measures with biological meanings are useful to reflect the impact of missing value imputation on the downstream data analyses. Our study suggests that local-least-squares-based methods are good choices to handle missing values for most of the microarray datasets. CONCLUSIONS: In this work, we carried out a comprehensive comparison of the algorithms for microarray missing value imputation. Based on such a comprehensive comparison, researchers could choose the optimal algorithm for their datasets easily. Moreover, new imputation algorithms could be compared with the existing algorithms using this comparison strategy as a standard protocol. In addition, to assist researchers in dealing with missing values easily, we built a web-based and easy-to-use imputation tool, MissVIA (http://cosbi.ee.ncku.edu.tw/MissVIA), which supports many imputation algorithms. Once users upload a real microarray dataset and choose the imputation algorithms, MissVIA will determine the optimal algorithm for the users' data through a series of simulations, and then the imputed results can be downloaded for the downstream data analyses.
format Online
Article
Text
id pubmed-4028811
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40288112014-06-04 Missing value imputation for microarray data: a comprehensive comparison study and a web tool Chiu, Chia-Chun Chan, Shih-Yao Wang, Chung-Ching Wu, Wei-Sheng BMC Syst Biol Research BACKGROUND: Microarray data are usually peppered with missing values due to various reasons. However, most of the downstream analyses for microarray data require complete datasets. Therefore, accurate algorithms for missing value estimation are needed for improving the performance of microarray data analyses. Although many algorithms have been developed, there are many debates on the selection of the optimal algorithm. The studies about the performance comparison of different algorithms are still incomprehensive, especially in the number of benchmark datasets used, the number of algorithms compared, the rounds of simulation conducted, and the performance measures used. RESULTS: In this paper, we performed a comprehensive comparison by using (I) thirteen datasets, (II) nine algorithms, (III) 110 independent runs of simulation, and (IV) three types of measures to evaluate the performance of each imputation algorithm fairly. First, the effects of different types of microarray datasets on the performance of each imputation algorithm were evaluated. Second, we discussed whether the datasets from different species have different impact on the performance of different algorithms. To assess the performance of each algorithm fairly, all evaluations were performed using three types of measures. Our results indicate that the performance of an imputation algorithm mainly depends on the type of a dataset but not on the species where the samples come from. In addition to the statistical measure, two other measures with biological meanings are useful to reflect the impact of missing value imputation on the downstream data analyses. Our study suggests that local-least-squares-based methods are good choices to handle missing values for most of the microarray datasets. CONCLUSIONS: In this work, we carried out a comprehensive comparison of the algorithms for microarray missing value imputation. Based on such a comprehensive comparison, researchers could choose the optimal algorithm for their datasets easily. Moreover, new imputation algorithms could be compared with the existing algorithms using this comparison strategy as a standard protocol. In addition, to assist researchers in dealing with missing values easily, we built a web-based and easy-to-use imputation tool, MissVIA (http://cosbi.ee.ncku.edu.tw/MissVIA), which supports many imputation algorithms. Once users upload a real microarray dataset and choose the imputation algorithms, MissVIA will determine the optimal algorithm for the users' data through a series of simulations, and then the imputed results can be downloaded for the downstream data analyses. BioMed Central 2013-12-13 /pmc/articles/PMC4028811/ /pubmed/24565220 http://dx.doi.org/10.1186/1752-0509-7-S6-S12 Text en Copyright © 2013 Chiu et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Chiu, Chia-Chun
Chan, Shih-Yao
Wang, Chung-Ching
Wu, Wei-Sheng
Missing value imputation for microarray data: a comprehensive comparison study and a web tool
title Missing value imputation for microarray data: a comprehensive comparison study and a web tool
title_full Missing value imputation for microarray data: a comprehensive comparison study and a web tool
title_fullStr Missing value imputation for microarray data: a comprehensive comparison study and a web tool
title_full_unstemmed Missing value imputation for microarray data: a comprehensive comparison study and a web tool
title_short Missing value imputation for microarray data: a comprehensive comparison study and a web tool
title_sort missing value imputation for microarray data: a comprehensive comparison study and a web tool
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4028811/
https://www.ncbi.nlm.nih.gov/pubmed/24565220
http://dx.doi.org/10.1186/1752-0509-7-S6-S12
work_keys_str_mv AT chiuchiachun missingvalueimputationformicroarraydataacomprehensivecomparisonstudyandawebtool
AT chanshihyao missingvalueimputationformicroarraydataacomprehensivecomparisonstudyandawebtool
AT wangchungching missingvalueimputationformicroarraydataacomprehensivecomparisonstudyandawebtool
AT wuweisheng missingvalueimputationformicroarraydataacomprehensivecomparisonstudyandawebtool