Cargando…

HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values

Dataset integration is common practice to overcome limitations in statistically underpowered omics datasets. Proteome datasets display high technical variability and frequent missing values. Sophisticated strategies for batch effect reduction are lacking or rely on error-prone data imputation. Here...

Descripción completa

Detalles Bibliográficos
Autores principales: Voß, Hannah, Schlumbohm, Simon, Barwikowski, Philip, Wurlitzer, Marcus, Dottermusch, Matthias, Neumann, Philipp, Schlüter, Hartmut, Neumann, Julia E., Krisp, Christoph
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9209422/
https://www.ncbi.nlm.nih.gov/pubmed/35725563
http://dx.doi.org/10.1038/s41467-022-31007-x
_version_ 1784729951861211136
author Voß, Hannah
Schlumbohm, Simon
Barwikowski, Philip
Wurlitzer, Marcus
Dottermusch, Matthias
Neumann, Philipp
Schlüter, Hartmut
Neumann, Julia E.
Krisp, Christoph
author_facet Voß, Hannah
Schlumbohm, Simon
Barwikowski, Philip
Wurlitzer, Marcus
Dottermusch, Matthias
Neumann, Philipp
Schlüter, Hartmut
Neumann, Julia E.
Krisp, Christoph
author_sort Voß, Hannah
collection PubMed
description Dataset integration is common practice to overcome limitations in statistically underpowered omics datasets. Proteome datasets display high technical variability and frequent missing values. Sophisticated strategies for batch effect reduction are lacking or rely on error-prone data imputation. Here we introduce HarmonizR, a data harmonization tool with appropriate missing value handling. The method exploits the structure of available data and matrix dissection for minimal data loss, without data imputation. This strategy implements two common batch effect reduction methods—ComBat and limma (removeBatchEffect()). The HarmonizR strategy, evaluated on four exemplarily analyzed datasets with up to 23 batches, demonstrated successful data harmonization for different tissue preservation techniques, LC-MS/MS instrumentation setups, and quantification approaches. Compared to data imputation methods, HarmonizR was more efficient and performed superior regarding the detection of significant proteins. HarmonizR is an efficient tool for missing data tolerant experimental variance reduction and is easily adjustable for individual dataset properties and user preferences.
format Online
Article
Text
id pubmed-9209422
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-92094222022-06-22 HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values Voß, Hannah Schlumbohm, Simon Barwikowski, Philip Wurlitzer, Marcus Dottermusch, Matthias Neumann, Philipp Schlüter, Hartmut Neumann, Julia E. Krisp, Christoph Nat Commun Article Dataset integration is common practice to overcome limitations in statistically underpowered omics datasets. Proteome datasets display high technical variability and frequent missing values. Sophisticated strategies for batch effect reduction are lacking or rely on error-prone data imputation. Here we introduce HarmonizR, a data harmonization tool with appropriate missing value handling. The method exploits the structure of available data and matrix dissection for minimal data loss, without data imputation. This strategy implements two common batch effect reduction methods—ComBat and limma (removeBatchEffect()). The HarmonizR strategy, evaluated on four exemplarily analyzed datasets with up to 23 batches, demonstrated successful data harmonization for different tissue preservation techniques, LC-MS/MS instrumentation setups, and quantification approaches. Compared to data imputation methods, HarmonizR was more efficient and performed superior regarding the detection of significant proteins. HarmonizR is an efficient tool for missing data tolerant experimental variance reduction and is easily adjustable for individual dataset properties and user preferences. Nature Publishing Group UK 2022-06-20 /pmc/articles/PMC9209422/ /pubmed/35725563 http://dx.doi.org/10.1038/s41467-022-31007-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Voß, Hannah
Schlumbohm, Simon
Barwikowski, Philip
Wurlitzer, Marcus
Dottermusch, Matthias
Neumann, Philipp
Schlüter, Hartmut
Neumann, Julia E.
Krisp, Christoph
HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values
title HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values
title_full HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values
title_fullStr HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values
title_full_unstemmed HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values
title_short HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values
title_sort harmonizr enables data harmonization across independent proteomic datasets with appropriate handling of missing values
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9209422/
https://www.ncbi.nlm.nih.gov/pubmed/35725563
http://dx.doi.org/10.1038/s41467-022-31007-x
work_keys_str_mv AT voßhannah harmonizrenablesdataharmonizationacrossindependentproteomicdatasetswithappropriatehandlingofmissingvalues
AT schlumbohmsimon harmonizrenablesdataharmonizationacrossindependentproteomicdatasetswithappropriatehandlingofmissingvalues
AT barwikowskiphilip harmonizrenablesdataharmonizationacrossindependentproteomicdatasetswithappropriatehandlingofmissingvalues
AT wurlitzermarcus harmonizrenablesdataharmonizationacrossindependentproteomicdatasetswithappropriatehandlingofmissingvalues
AT dottermuschmatthias harmonizrenablesdataharmonizationacrossindependentproteomicdatasetswithappropriatehandlingofmissingvalues
AT neumannphilipp harmonizrenablesdataharmonizationacrossindependentproteomicdatasetswithappropriatehandlingofmissingvalues
AT schluterhartmut harmonizrenablesdataharmonizationacrossindependentproteomicdatasetswithappropriatehandlingofmissingvalues
AT neumannjuliae harmonizrenablesdataharmonizationacrossindependentproteomicdatasetswithappropriatehandlingofmissingvalues
AT krispchristoph harmonizrenablesdataharmonizationacrossindependentproteomicdatasetswithappropriatehandlingofmissingvalues