Cargando…

Missing value imputation improves clustering and interpretation of gene expression microarray data

BACKGROUND: Missing values frequently pose problems in gene expression microarray experiments as they can hinder downstream analysis of the datasets. While several missing value imputation approaches are available to the microarray users and new ones are constantly being developed, there is no gener...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tuikkala, Johannes, Elo, Laura L, Nevalainen, Olli S, Aittokallio, Tero
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2386492/ https://www.ncbi.nlm.nih.gov/pubmed/18423022 http://dx.doi.org/10.1186/1471-2105-9-202

_version_	1782155244671598592
author	Tuikkala, Johannes Elo, Laura L Nevalainen, Olli S Aittokallio, Tero
author_facet	Tuikkala, Johannes Elo, Laura L Nevalainen, Olli S Aittokallio, Tero
author_sort	Tuikkala, Johannes
collection	PubMed
description	BACKGROUND: Missing values frequently pose problems in gene expression microarray experiments as they can hinder downstream analysis of the datasets. While several missing value imputation approaches are available to the microarray users and new ones are constantly being developed, there is no general consensus on how to choose between the different methods since their performance seems to vary drastically depending on the dataset being used. RESULTS: We show that this discrepancy can mostly be attributed to the way in which imputation methods have traditionally been developed and evaluated. By comparing a number of advanced imputation methods on recent microarray datasets, we show that even when there are marked differences in the measurement-level imputation accuracies across the datasets, these differences become negligible when the methods are evaluated in terms of how well they can reproduce the original gene clusters or their biological interpretations. Regardless of the evaluation approach, however, imputation always gave better results than ignoring missing data points or replacing them with zeros or average values, emphasizing the continued importance of using more advanced imputation methods. CONCLUSION: The results demonstrate that, while missing values are still severely complicating microarray data analysis, their impact on the discovery of biologically meaningful gene groups can – up to a certain degree – be reduced by using readily available and relatively fast imputation methods, such as the Bayesian Principal Components Algorithm (BPCA).
format	Text
id	pubmed-2386492
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-23864922008-05-16 Missing value imputation improves clustering and interpretation of gene expression microarray data Tuikkala, Johannes Elo, Laura L Nevalainen, Olli S Aittokallio, Tero BMC Bioinformatics Research Article BACKGROUND: Missing values frequently pose problems in gene expression microarray experiments as they can hinder downstream analysis of the datasets. While several missing value imputation approaches are available to the microarray users and new ones are constantly being developed, there is no general consensus on how to choose between the different methods since their performance seems to vary drastically depending on the dataset being used. RESULTS: We show that this discrepancy can mostly be attributed to the way in which imputation methods have traditionally been developed and evaluated. By comparing a number of advanced imputation methods on recent microarray datasets, we show that even when there are marked differences in the measurement-level imputation accuracies across the datasets, these differences become negligible when the methods are evaluated in terms of how well they can reproduce the original gene clusters or their biological interpretations. Regardless of the evaluation approach, however, imputation always gave better results than ignoring missing data points or replacing them with zeros or average values, emphasizing the continued importance of using more advanced imputation methods. CONCLUSION: The results demonstrate that, while missing values are still severely complicating microarray data analysis, their impact on the discovery of biologically meaningful gene groups can – up to a certain degree – be reduced by using readily available and relatively fast imputation methods, such as the Bayesian Principal Components Algorithm (BPCA). BioMed Central 2008-04-18 /pmc/articles/PMC2386492/ /pubmed/18423022 http://dx.doi.org/10.1186/1471-2105-9-202 Text en Copyright © 2008 Tuikkala et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Tuikkala, Johannes Elo, Laura L Nevalainen, Olli S Aittokallio, Tero Missing value imputation improves clustering and interpretation of gene expression microarray data
title	Missing value imputation improves clustering and interpretation of gene expression microarray data
title_full	Missing value imputation improves clustering and interpretation of gene expression microarray data
title_fullStr	Missing value imputation improves clustering and interpretation of gene expression microarray data
title_full_unstemmed	Missing value imputation improves clustering and interpretation of gene expression microarray data
title_short	Missing value imputation improves clustering and interpretation of gene expression microarray data
title_sort	missing value imputation improves clustering and interpretation of gene expression microarray data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2386492/ https://www.ncbi.nlm.nih.gov/pubmed/18423022 http://dx.doi.org/10.1186/1471-2105-9-202
work_keys_str_mv	AT tuikkalajohannes missingvalueimputationimprovesclusteringandinterpretationofgeneexpressionmicroarraydata AT elolaural missingvalueimputationimprovesclusteringandinterpretationofgeneexpressionmicroarraydata AT nevalainenollis missingvalueimputationimprovesclusteringandinterpretationofgeneexpressionmicroarraydata AT aittokalliotero missingvalueimputationimprovesclusteringandinterpretationofgeneexpressionmicroarraydata

Missing value imputation improves clustering and interpretation of gene expression microarray data

Ejemplares similares