Cargando…

Normics: Proteomic Normalization by Variance and Data-Inherent Correlation Structure

Several algorithms for the normalization of proteomic data are currently available, each based on a priori assumptions. Among these is the extent to which differential expression (DE) can be present in the dataset. This factor is usually unknown in explorative biomarker screens. Simultaneously, the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dressler, Franz F., Brägelmann, Johannes, Reischl, Markus, Perner, Sven
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Society for Biochemistry and Molecular Biology 2022
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9450154/ https://www.ncbi.nlm.nih.gov/pubmed/35853575 http://dx.doi.org/10.1016/j.mcpro.2022.100269

_version_	1784784462265974784
author	Dressler, Franz F. Brägelmann, Johannes Reischl, Markus Perner, Sven
author_facet	Dressler, Franz F. Brägelmann, Johannes Reischl, Markus Perner, Sven
author_sort	Dressler, Franz F.
collection	PubMed
description	Several algorithms for the normalization of proteomic data are currently available, each based on a priori assumptions. Among these is the extent to which differential expression (DE) can be present in the dataset. This factor is usually unknown in explorative biomarker screens. Simultaneously, the increasing depth of proteomic analyses often requires the selection of subsets with a high probability of being DE to obtain meaningful results in downstream bioinformatical analyses. Based on the relationship of technical variation and (true) biological DE of an unknown share of proteins, we propose the “Normics” algorithm: Proteins are ranked based on their expression level–corrected variance and the mean correlation with all other proteins. The latter serves as a novel indicator of the non-DE likelihood of a protein in a given dataset. Subsequent normalization is based on a subset of non-DE proteins only. No a priori information such as batch, clinical, or replicate group is necessary. Simulation data demonstrated robust and superior performance across a wide range of stochastically chosen parameters. Five publicly available spike-in and biologically variant datasets were reliably and quantitively accurately normalized by Normics with improved performance compared to standard variance stabilization as well as median, quantile, and LOESS normalizations. In complex biological datasets Normics correctly determined proteins as being DE that had been cross-validated by an independent transcriptome analysis of the same samples. In both complex datasets Normics identified the most DE proteins. We demonstrate that combining variance analysis and data-inherent correlation structure to identify non-DE proteins improves data normalization. Standard normalization algorithms can be consolidated against high shares of (one-sided) biological regulation. The statistical power of downstream analyses can be increased by focusing on Normics-selected subsets of high DE likelihood.
format	Online Article Text
id	pubmed-9450154
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	American Society for Biochemistry and Molecular Biology
record_format	MEDLINE/PubMed
spelling	pubmed-94501542022-09-09 Normics: Proteomic Normalization by Variance and Data-Inherent Correlation Structure Dressler, Franz F. Brägelmann, Johannes Reischl, Markus Perner, Sven Mol Cell Proteomics Research Several algorithms for the normalization of proteomic data are currently available, each based on a priori assumptions. Among these is the extent to which differential expression (DE) can be present in the dataset. This factor is usually unknown in explorative biomarker screens. Simultaneously, the increasing depth of proteomic analyses often requires the selection of subsets with a high probability of being DE to obtain meaningful results in downstream bioinformatical analyses. Based on the relationship of technical variation and (true) biological DE of an unknown share of proteins, we propose the “Normics” algorithm: Proteins are ranked based on their expression level–corrected variance and the mean correlation with all other proteins. The latter serves as a novel indicator of the non-DE likelihood of a protein in a given dataset. Subsequent normalization is based on a subset of non-DE proteins only. No a priori information such as batch, clinical, or replicate group is necessary. Simulation data demonstrated robust and superior performance across a wide range of stochastically chosen parameters. Five publicly available spike-in and biologically variant datasets were reliably and quantitively accurately normalized by Normics with improved performance compared to standard variance stabilization as well as median, quantile, and LOESS normalizations. In complex biological datasets Normics correctly determined proteins as being DE that had been cross-validated by an independent transcriptome analysis of the same samples. In both complex datasets Normics identified the most DE proteins. We demonstrate that combining variance analysis and data-inherent correlation structure to identify non-DE proteins improves data normalization. Standard normalization algorithms can be consolidated against high shares of (one-sided) biological regulation. The statistical power of downstream analyses can be increased by focusing on Normics-selected subsets of high DE likelihood. American Society for Biochemistry and Molecular Biology 2022-07-16 /pmc/articles/PMC9450154/ /pubmed/35853575 http://dx.doi.org/10.1016/j.mcpro.2022.100269 Text en © 2022 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Research Dressler, Franz F. Brägelmann, Johannes Reischl, Markus Perner, Sven Normics: Proteomic Normalization by Variance and Data-Inherent Correlation Structure
title	Normics: Proteomic Normalization by Variance and Data-Inherent Correlation Structure
title_full	Normics: Proteomic Normalization by Variance and Data-Inherent Correlation Structure
title_fullStr	Normics: Proteomic Normalization by Variance and Data-Inherent Correlation Structure
title_full_unstemmed	Normics: Proteomic Normalization by Variance and Data-Inherent Correlation Structure
title_short	Normics: Proteomic Normalization by Variance and Data-Inherent Correlation Structure
title_sort	normics: proteomic normalization by variance and data-inherent correlation structure
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9450154/ https://www.ncbi.nlm.nih.gov/pubmed/35853575 http://dx.doi.org/10.1016/j.mcpro.2022.100269
work_keys_str_mv	AT dresslerfranzf normicsproteomicnormalizationbyvarianceanddatainherentcorrelationstructure AT bragelmannjohannes normicsproteomicnormalizationbyvarianceanddatainherentcorrelationstructure AT reischlmarkus normicsproteomicnormalizationbyvarianceanddatainherentcorrelationstructure AT pernersven normicsproteomicnormalizationbyvarianceanddatainherentcorrelationstructure

Normics: Proteomic Normalization by Variance and Data-Inherent Correlation Structure

Ejemplares similares