Cargando…

Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets

Introduction. The microarray datasets from the MicroArray Quality Control (MAQC) project have enabled the assessment of the precision, comparability of microarrays, and other various microarray analysis methods. However, to date no studies that we are aware of have reported the performance of missin...

Descripción completa

Detalles Bibliográficos
Autores principales: Rao, Sreevidya Sadananda Sadasiva, Shepherd, Lori A., Bruno, Andrew E., Liu, Song, Miecznikowski, Jeffrey C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3809938/
https://www.ncbi.nlm.nih.gov/pubmed/24223587
http://dx.doi.org/10.1155/2013/790567
_version_ 1782288731966799872
author Rao, Sreevidya Sadananda Sadasiva
Shepherd, Lori A.
Bruno, Andrew E.
Liu, Song
Miecznikowski, Jeffrey C.
author_facet Rao, Sreevidya Sadananda Sadasiva
Shepherd, Lori A.
Bruno, Andrew E.
Liu, Song
Miecznikowski, Jeffrey C.
author_sort Rao, Sreevidya Sadananda Sadasiva
collection PubMed
description Introduction. The microarray datasets from the MicroArray Quality Control (MAQC) project have enabled the assessment of the precision, comparability of microarrays, and other various microarray analysis methods. However, to date no studies that we are aware of have reported the performance of missing value imputation schemes on the MAQC datasets. In this study, we use the MAQC Affymetrix datasets to evaluate several imputation procedures in Affymetrix microarrays. Results. We evaluated several cutting edge imputation procedures and compared them using different error measures. We randomly deleted 5% and 10% of the data and imputed the missing values using imputation tests. We performed 1000 simulations and averaged the results. The results for both 5% and 10% deletion are similar. Among the imputation methods, we observe the local least squares method with k = 4 is most accurate under the error measures considered. The k-nearest neighbor method with k = 1 has the highest error rate among imputation methods and error measures. Conclusions. We conclude for imputing missing values in Affymetrix microarray datasets, using the MAS 5.0 preprocessing scheme, the local least squares method with k = 4 has the best overall performance and k-nearest neighbor method with k = 1 has the worst overall performance. These results hold true for both 5% and 10% missing values.
format Online
Article
Text
id pubmed-3809938
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-38099382013-11-10 Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets Rao, Sreevidya Sadananda Sadasiva Shepherd, Lori A. Bruno, Andrew E. Liu, Song Miecznikowski, Jeffrey C. Adv Bioinformatics Research Article Introduction. The microarray datasets from the MicroArray Quality Control (MAQC) project have enabled the assessment of the precision, comparability of microarrays, and other various microarray analysis methods. However, to date no studies that we are aware of have reported the performance of missing value imputation schemes on the MAQC datasets. In this study, we use the MAQC Affymetrix datasets to evaluate several imputation procedures in Affymetrix microarrays. Results. We evaluated several cutting edge imputation procedures and compared them using different error measures. We randomly deleted 5% and 10% of the data and imputed the missing values using imputation tests. We performed 1000 simulations and averaged the results. The results for both 5% and 10% deletion are similar. Among the imputation methods, we observe the local least squares method with k = 4 is most accurate under the error measures considered. The k-nearest neighbor method with k = 1 has the highest error rate among imputation methods and error measures. Conclusions. We conclude for imputing missing values in Affymetrix microarray datasets, using the MAS 5.0 preprocessing scheme, the local least squares method with k = 4 has the best overall performance and k-nearest neighbor method with k = 1 has the worst overall performance. These results hold true for both 5% and 10% missing values. Hindawi Publishing Corporation 2013 2013-10-09 /pmc/articles/PMC3809938/ /pubmed/24223587 http://dx.doi.org/10.1155/2013/790567 Text en Copyright © 2013 Sreevidya Sadananda Sadasiva Rao et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Rao, Sreevidya Sadananda Sadasiva
Shepherd, Lori A.
Bruno, Andrew E.
Liu, Song
Miecznikowski, Jeffrey C.
Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets
title Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets
title_full Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets
title_fullStr Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets
title_full_unstemmed Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets
title_short Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets
title_sort comparing imputation procedures for affymetrix gene expression datasets using maqc datasets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3809938/
https://www.ncbi.nlm.nih.gov/pubmed/24223587
http://dx.doi.org/10.1155/2013/790567
work_keys_str_mv AT raosreevidyasadanandasadasiva comparingimputationproceduresforaffymetrixgeneexpressiondatasetsusingmaqcdatasets
AT shepherdloria comparingimputationproceduresforaffymetrixgeneexpressiondatasetsusingmaqcdatasets
AT brunoandrewe comparingimputationproceduresforaffymetrixgeneexpressiondatasetsusingmaqcdatasets
AT liusong comparingimputationproceduresforaffymetrixgeneexpressiondatasetsusingmaqcdatasets
AT miecznikowskijeffreyc comparingimputationproceduresforaffymetrixgeneexpressiondatasetsusingmaqcdatasets