Cargando…

A comparison of imputation procedures and statistical tests for the analysis of two-dimensional electrophoresis data

BACKGROUND: Numerous gel-based softwares exist to detect protein changes potentially associated with disease. The data, however, are abundant with technical and structural complexities, making statistical analysis a difficult task. A particularly important topic is how the various softwares handle m...

Descripción completa

Detalles Bibliográficos
Autores principales: Miecznikowski, Jeffrey C, Damodaran, Senthilkumar, Sellers, Kimberly F, Rabin, Richard A
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3012036/
https://www.ncbi.nlm.nih.gov/pubmed/21159180
http://dx.doi.org/10.1186/1477-5956-8-66
_version_ 1782195062069788672
author Miecznikowski, Jeffrey C
Damodaran, Senthilkumar
Sellers, Kimberly F
Rabin, Richard A
author_facet Miecznikowski, Jeffrey C
Damodaran, Senthilkumar
Sellers, Kimberly F
Rabin, Richard A
author_sort Miecznikowski, Jeffrey C
collection PubMed
description BACKGROUND: Numerous gel-based softwares exist to detect protein changes potentially associated with disease. The data, however, are abundant with technical and structural complexities, making statistical analysis a difficult task. A particularly important topic is how the various softwares handle missing data. To date, no one has extensively studied the impact that interpolating missing data has on subsequent analysis of protein spots. RESULTS: This work highlights the existing algorithms for handling missing data in two-dimensional gel analysis and performs a thorough comparison of the various algorithms and statistical tests on simulated and real datasets. For imputation methods, the best results in terms of root mean squared error are obtained using the least squares method of imputation along with the expectation maximization (EM) algorithm approach to estimate missing values with an array covariance structure. The bootstrapped versions of the statistical tests offer the most liberal option for determining protein spot significance while the generalized family wise error rate (gFWER) should be considered for controlling the multiple testing error. CONCLUSIONS: In summary, we advocate for a three-step statistical analysis of two-dimensional gel electrophoresis (2-DE) data with a data imputation step, choice of statistical test, and lastly an error control method in light of multiple testing. When determining the choice of statistical test, it is worth considering whether the protein spots will be subjected to mass spectrometry. If this is the case a more liberal test such as the percentile-based bootstrap t can be employed. For error control in electrophoresis experiments, we advocate that gFWER be controlled for multiple testing rather than the false discovery rate.
format Text
id pubmed-3012036
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30120362010-12-30 A comparison of imputation procedures and statistical tests for the analysis of two-dimensional electrophoresis data Miecznikowski, Jeffrey C Damodaran, Senthilkumar Sellers, Kimberly F Rabin, Richard A Proteome Sci Research BACKGROUND: Numerous gel-based softwares exist to detect protein changes potentially associated with disease. The data, however, are abundant with technical and structural complexities, making statistical analysis a difficult task. A particularly important topic is how the various softwares handle missing data. To date, no one has extensively studied the impact that interpolating missing data has on subsequent analysis of protein spots. RESULTS: This work highlights the existing algorithms for handling missing data in two-dimensional gel analysis and performs a thorough comparison of the various algorithms and statistical tests on simulated and real datasets. For imputation methods, the best results in terms of root mean squared error are obtained using the least squares method of imputation along with the expectation maximization (EM) algorithm approach to estimate missing values with an array covariance structure. The bootstrapped versions of the statistical tests offer the most liberal option for determining protein spot significance while the generalized family wise error rate (gFWER) should be considered for controlling the multiple testing error. CONCLUSIONS: In summary, we advocate for a three-step statistical analysis of two-dimensional gel electrophoresis (2-DE) data with a data imputation step, choice of statistical test, and lastly an error control method in light of multiple testing. When determining the choice of statistical test, it is worth considering whether the protein spots will be subjected to mass spectrometry. If this is the case a more liberal test such as the percentile-based bootstrap t can be employed. For error control in electrophoresis experiments, we advocate that gFWER be controlled for multiple testing rather than the false discovery rate. BioMed Central 2010-12-15 /pmc/articles/PMC3012036/ /pubmed/21159180 http://dx.doi.org/10.1186/1477-5956-8-66 Text en Copyright ©2010 Miecznikowski et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Miecznikowski, Jeffrey C
Damodaran, Senthilkumar
Sellers, Kimberly F
Rabin, Richard A
A comparison of imputation procedures and statistical tests for the analysis of two-dimensional electrophoresis data
title A comparison of imputation procedures and statistical tests for the analysis of two-dimensional electrophoresis data
title_full A comparison of imputation procedures and statistical tests for the analysis of two-dimensional electrophoresis data
title_fullStr A comparison of imputation procedures and statistical tests for the analysis of two-dimensional electrophoresis data
title_full_unstemmed A comparison of imputation procedures and statistical tests for the analysis of two-dimensional electrophoresis data
title_short A comparison of imputation procedures and statistical tests for the analysis of two-dimensional electrophoresis data
title_sort comparison of imputation procedures and statistical tests for the analysis of two-dimensional electrophoresis data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3012036/
https://www.ncbi.nlm.nih.gov/pubmed/21159180
http://dx.doi.org/10.1186/1477-5956-8-66
work_keys_str_mv AT miecznikowskijeffreyc acomparisonofimputationproceduresandstatisticaltestsfortheanalysisoftwodimensionalelectrophoresisdata
AT damodaransenthilkumar acomparisonofimputationproceduresandstatisticaltestsfortheanalysisoftwodimensionalelectrophoresisdata
AT sellerskimberlyf acomparisonofimputationproceduresandstatisticaltestsfortheanalysisoftwodimensionalelectrophoresisdata
AT rabinricharda acomparisonofimputationproceduresandstatisticaltestsfortheanalysisoftwodimensionalelectrophoresisdata
AT miecznikowskijeffreyc comparisonofimputationproceduresandstatisticaltestsfortheanalysisoftwodimensionalelectrophoresisdata
AT damodaransenthilkumar comparisonofimputationproceduresandstatisticaltestsfortheanalysisoftwodimensionalelectrophoresisdata
AT sellerskimberlyf comparisonofimputationproceduresandstatisticaltestsfortheanalysisoftwodimensionalelectrophoresisdata
AT rabinricharda comparisonofimputationproceduresandstatisticaltestsfortheanalysisoftwodimensionalelectrophoresisdata