Cargando…

HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values

Dataset integration is common practice to overcome limitations in statistically underpowered omics datasets. Proteome datasets display high technical variability and frequent missing values. Sophisticated strategies for batch effect reduction are lacking or rely on error-prone data imputation. Here...

Descripción completa

Detalles Bibliográficos
Autores principales:	Voß, Hannah, Schlumbohm, Simon, Barwikowski, Philip, Wurlitzer, Marcus, Dottermusch, Matthias, Neumann, Philipp, Schlüter, Hartmut, Neumann, Julia E., Krisp, Christoph
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9209422/ https://www.ncbi.nlm.nih.gov/pubmed/35725563 http://dx.doi.org/10.1038/s41467-022-31007-x

_version_	1784729951861211136
author	Voß, Hannah Schlumbohm, Simon Barwikowski, Philip Wurlitzer, Marcus Dottermusch, Matthias Neumann, Philipp Schlüter, Hartmut Neumann, Julia E. Krisp, Christoph
author_facet	Voß, Hannah Schlumbohm, Simon Barwikowski, Philip Wurlitzer, Marcus Dottermusch, Matthias Neumann, Philipp Schlüter, Hartmut Neumann, Julia E. Krisp, Christoph
author_sort	Voß, Hannah
collection	PubMed
description	Dataset integration is common practice to overcome limitations in statistically underpowered omics datasets. Proteome datasets display high technical variability and frequent missing values. Sophisticated strategies for batch effect reduction are lacking or rely on error-prone data imputation. Here we introduce HarmonizR, a data harmonization tool with appropriate missing value handling. The method exploits the structure of available data and matrix dissection for minimal data loss, without data imputation. This strategy implements two common batch effect reduction methods—ComBat and limma (removeBatchEffect()). The HarmonizR strategy, evaluated on four exemplarily analyzed datasets with up to 23 batches, demonstrated successful data harmonization for different tissue preservation techniques, LC-MS/MS instrumentation setups, and quantification approaches. Compared to data imputation methods, HarmonizR was more efficient and performed superior regarding the detection of significant proteins. HarmonizR is an efficient tool for missing data tolerant experimental variance reduction and is easily adjustable for individual dataset properties and user preferences.
format	Online Article Text
id	pubmed-9209422
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-92094222022-06-22 HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values Voß, Hannah Schlumbohm, Simon Barwikowski, Philip Wurlitzer, Marcus Dottermusch, Matthias Neumann, Philipp Schlüter, Hartmut Neumann, Julia E. Krisp, Christoph Nat Commun Article Dataset integration is common practice to overcome limitations in statistically underpowered omics datasets. Proteome datasets display high technical variability and frequent missing values. Sophisticated strategies for batch effect reduction are lacking or rely on error-prone data imputation. Here we introduce HarmonizR, a data harmonization tool with appropriate missing value handling. The method exploits the structure of available data and matrix dissection for minimal data loss, without data imputation. This strategy implements two common batch effect reduction methods—ComBat and limma (removeBatchEffect()). The HarmonizR strategy, evaluated on four exemplarily analyzed datasets with up to 23 batches, demonstrated successful data harmonization for different tissue preservation techniques, LC-MS/MS instrumentation setups, and quantification approaches. Compared to data imputation methods, HarmonizR was more efficient and performed superior regarding the detection of significant proteins. HarmonizR is an efficient tool for missing data tolerant experimental variance reduction and is easily adjustable for individual dataset properties and user preferences. Nature Publishing Group UK 2022-06-20 /pmc/articles/PMC9209422/ /pubmed/35725563 http://dx.doi.org/10.1038/s41467-022-31007-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Voß, Hannah Schlumbohm, Simon Barwikowski, Philip Wurlitzer, Marcus Dottermusch, Matthias Neumann, Philipp Schlüter, Hartmut Neumann, Julia E. Krisp, Christoph HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values
title	HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values
title_full	HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values
title_fullStr	HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values
title_full_unstemmed	HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values
title_short	HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values
title_sort	harmonizr enables data harmonization across independent proteomic datasets with appropriate handling of missing values
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9209422/ https://www.ncbi.nlm.nih.gov/pubmed/35725563 http://dx.doi.org/10.1038/s41467-022-31007-x
work_keys_str_mv	AT voßhannah harmonizrenablesdataharmonizationacrossindependentproteomicdatasetswithappropriatehandlingofmissingvalues AT schlumbohmsimon harmonizrenablesdataharmonizationacrossindependentproteomicdatasetswithappropriatehandlingofmissingvalues AT barwikowskiphilip harmonizrenablesdataharmonizationacrossindependentproteomicdatasetswithappropriatehandlingofmissingvalues AT wurlitzermarcus harmonizrenablesdataharmonizationacrossindependentproteomicdatasetswithappropriatehandlingofmissingvalues AT dottermuschmatthias harmonizrenablesdataharmonizationacrossindependentproteomicdatasetswithappropriatehandlingofmissingvalues AT neumannphilipp harmonizrenablesdataharmonizationacrossindependentproteomicdatasetswithappropriatehandlingofmissingvalues AT schluterhartmut harmonizrenablesdataharmonizationacrossindependentproteomicdatasetswithappropriatehandlingofmissingvalues AT neumannjuliae harmonizrenablesdataharmonizationacrossindependentproteomicdatasetswithappropriatehandlingofmissingvalues AT krispchristoph harmonizrenablesdataharmonizationacrossindependentproteomicdatasetswithappropriatehandlingofmissingvalues

HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values

Ejemplares similares