Cargando…

Missing value imputation in proximity extension assay-based targeted proteomics data

Targeted proteomics utilizing antibody-based proximity extension assays provides sensitive and highly specific quantifications of plasma protein levels. Multivariate analysis of this data is hampered by frequent missing values (random or left censored), calling for imputation approaches. While appro...

Descripción completa

Detalles Bibliográficos
Autores principales: Lenz, Michael, Schulz, Andreas, Koeck, Thomas, Rapp, Steffen, Nagler, Markus, Sauer, Madeleine, Eggebrecht, Lisa, Ten Cate, Vincent, Panova-Noeva, Marina, Prochaska, Jürgen H., Lackner, Karl J., Münzel, Thomas, Leineweber, Kirsten, Wild, Philipp S., Andrade-Navarro, Miguel A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7735586/
https://www.ncbi.nlm.nih.gov/pubmed/33315883
http://dx.doi.org/10.1371/journal.pone.0243487
_version_ 1783622662081216512
author Lenz, Michael
Schulz, Andreas
Koeck, Thomas
Rapp, Steffen
Nagler, Markus
Sauer, Madeleine
Eggebrecht, Lisa
Ten Cate, Vincent
Panova-Noeva, Marina
Prochaska, Jürgen H.
Lackner, Karl J.
Münzel, Thomas
Leineweber, Kirsten
Wild, Philipp S.
Andrade-Navarro, Miguel A.
author_facet Lenz, Michael
Schulz, Andreas
Koeck, Thomas
Rapp, Steffen
Nagler, Markus
Sauer, Madeleine
Eggebrecht, Lisa
Ten Cate, Vincent
Panova-Noeva, Marina
Prochaska, Jürgen H.
Lackner, Karl J.
Münzel, Thomas
Leineweber, Kirsten
Wild, Philipp S.
Andrade-Navarro, Miguel A.
author_sort Lenz, Michael
collection PubMed
description Targeted proteomics utilizing antibody-based proximity extension assays provides sensitive and highly specific quantifications of plasma protein levels. Multivariate analysis of this data is hampered by frequent missing values (random or left censored), calling for imputation approaches. While appropriate missing-value imputation methods exist, benchmarks of their performance in targeted proteomics data are lacking. Here, we assessed the performance of two methods for imputation of values missing completely at random, the previously top-benchmarked ‘missForest’ and the recently published ‘GSimp’ method. Evaluation was accomplished by comparing imputed with remeasured relative concentrations of 91 inflammation related circulating proteins in 86 samples from a cohort of 645 patients with venous thromboembolism. The median Pearson correlation between imputed and remeasured protein expression values was 69.0% for missForest and 71.6% for GSimp (p = 5.8e-4). Imputation with missForest resulted in stronger reduction of variance compared to GSimp (median relative variance of 25.3% vs. 68.6%, p = 2.4e-16) and undesired larger bias in downstream analyses. Irrespective of the imputation method used, the 91 imputed proteins revealed large variations in imputation accuracy, driven by differences in signal to noise ratio and information overlap between proteins. In summary, GSimp outperformed missForest, while both methods show good overall imputation accuracy with large variations between proteins.
format Online
Article
Text
id pubmed-7735586
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-77355862020-12-22 Missing value imputation in proximity extension assay-based targeted proteomics data Lenz, Michael Schulz, Andreas Koeck, Thomas Rapp, Steffen Nagler, Markus Sauer, Madeleine Eggebrecht, Lisa Ten Cate, Vincent Panova-Noeva, Marina Prochaska, Jürgen H. Lackner, Karl J. Münzel, Thomas Leineweber, Kirsten Wild, Philipp S. Andrade-Navarro, Miguel A. PLoS One Research Article Targeted proteomics utilizing antibody-based proximity extension assays provides sensitive and highly specific quantifications of plasma protein levels. Multivariate analysis of this data is hampered by frequent missing values (random or left censored), calling for imputation approaches. While appropriate missing-value imputation methods exist, benchmarks of their performance in targeted proteomics data are lacking. Here, we assessed the performance of two methods for imputation of values missing completely at random, the previously top-benchmarked ‘missForest’ and the recently published ‘GSimp’ method. Evaluation was accomplished by comparing imputed with remeasured relative concentrations of 91 inflammation related circulating proteins in 86 samples from a cohort of 645 patients with venous thromboembolism. The median Pearson correlation between imputed and remeasured protein expression values was 69.0% for missForest and 71.6% for GSimp (p = 5.8e-4). Imputation with missForest resulted in stronger reduction of variance compared to GSimp (median relative variance of 25.3% vs. 68.6%, p = 2.4e-16) and undesired larger bias in downstream analyses. Irrespective of the imputation method used, the 91 imputed proteins revealed large variations in imputation accuracy, driven by differences in signal to noise ratio and information overlap between proteins. In summary, GSimp outperformed missForest, while both methods show good overall imputation accuracy with large variations between proteins. Public Library of Science 2020-12-14 /pmc/articles/PMC7735586/ /pubmed/33315883 http://dx.doi.org/10.1371/journal.pone.0243487 Text en © 2020 Lenz et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Lenz, Michael
Schulz, Andreas
Koeck, Thomas
Rapp, Steffen
Nagler, Markus
Sauer, Madeleine
Eggebrecht, Lisa
Ten Cate, Vincent
Panova-Noeva, Marina
Prochaska, Jürgen H.
Lackner, Karl J.
Münzel, Thomas
Leineweber, Kirsten
Wild, Philipp S.
Andrade-Navarro, Miguel A.
Missing value imputation in proximity extension assay-based targeted proteomics data
title Missing value imputation in proximity extension assay-based targeted proteomics data
title_full Missing value imputation in proximity extension assay-based targeted proteomics data
title_fullStr Missing value imputation in proximity extension assay-based targeted proteomics data
title_full_unstemmed Missing value imputation in proximity extension assay-based targeted proteomics data
title_short Missing value imputation in proximity extension assay-based targeted proteomics data
title_sort missing value imputation in proximity extension assay-based targeted proteomics data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7735586/
https://www.ncbi.nlm.nih.gov/pubmed/33315883
http://dx.doi.org/10.1371/journal.pone.0243487
work_keys_str_mv AT lenzmichael missingvalueimputationinproximityextensionassaybasedtargetedproteomicsdata
AT schulzandreas missingvalueimputationinproximityextensionassaybasedtargetedproteomicsdata
AT koeckthomas missingvalueimputationinproximityextensionassaybasedtargetedproteomicsdata
AT rappsteffen missingvalueimputationinproximityextensionassaybasedtargetedproteomicsdata
AT naglermarkus missingvalueimputationinproximityextensionassaybasedtargetedproteomicsdata
AT sauermadeleine missingvalueimputationinproximityextensionassaybasedtargetedproteomicsdata
AT eggebrechtlisa missingvalueimputationinproximityextensionassaybasedtargetedproteomicsdata
AT tencatevincent missingvalueimputationinproximityextensionassaybasedtargetedproteomicsdata
AT panovanoevamarina missingvalueimputationinproximityextensionassaybasedtargetedproteomicsdata
AT prochaskajurgenh missingvalueimputationinproximityextensionassaybasedtargetedproteomicsdata
AT lacknerkarlj missingvalueimputationinproximityextensionassaybasedtargetedproteomicsdata
AT munzelthomas missingvalueimputationinproximityextensionassaybasedtargetedproteomicsdata
AT leineweberkirsten missingvalueimputationinproximityextensionassaybasedtargetedproteomicsdata
AT wildphilipps missingvalueimputationinproximityextensionassaybasedtargetedproteomicsdata
AT andradenavarromiguela missingvalueimputationinproximityextensionassaybasedtargetedproteomicsdata