Cargando…

Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed

Genome-wide analysis of gene expression or protein binding patterns using different array or sequencing based technologies is now routinely performed to compare different populations, such as treatment and reference groups. It is often necessary to normalize the data obtained to remove technical var...

Descripción completa

Detalles Bibliográficos
Autores principales:	Landfors, Mattias, Philip, Philge, Rydén, Patrik, Stenberg, Per
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2011
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3222656/ https://www.ncbi.nlm.nih.gov/pubmed/22132175 http://dx.doi.org/10.1371/journal.pone.0027942

_version_	1782217214596743168
author	Landfors, Mattias Philip, Philge Rydén, Patrik Stenberg, Per
author_facet	Landfors, Mattias Philip, Philge Rydén, Patrik Stenberg, Per
author_sort	Landfors, Mattias
collection	PubMed
description	Genome-wide analysis of gene expression or protein binding patterns using different array or sequencing based technologies is now routinely performed to compare different populations, such as treatment and reference groups. It is often necessary to normalize the data obtained to remove technical variation introduced in the course of conducting experimental work, but standard normalization techniques are not capable of eliminating technical bias in cases where the distribution of the truly altered variables is skewed, i.e. when a large fraction of the variables are either positively or negatively affected by the treatment. However, several experiments are likely to generate such skewed distributions, including ChIP-chip experiments for the study of chromatin, gene expression experiments for the study of apoptosis, and SNP-studies of copy number variation in normal and tumour tissues. A preliminary study using spike-in array data established that the capacity of an experiment to identify altered variables and generate unbiased estimates of the fold change decreases as the fraction of altered variables and the skewness increases. We propose the following work-flow for analyzing high-dimensional experiments with regions of altered variables: (1) Pre-process raw data using one of the standard normalization techniques. (2) Investigate if the distribution of the altered variables is skewed. (3) If the distribution is not believed to be skewed, no additional normalization is needed. Otherwise, re-normalize the data using a novel HMM-assisted normalization procedure. (4) Perform downstream analysis. Here, ChIP-chip data and simulated data were used to evaluate the performance of the work-flow. It was found that skewed distributions can be detected by using the novel DSE-test (Detection of Skewed Experiments). Furthermore, applying the HMM-assisted normalization to experiments where the distribution of the truly altered variables is skewed results in considerably higher sensitivity and lower bias than can be attained using standard and invariant normalization methods.
format	Online Article Text
id	pubmed-3222656
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-32226562011-11-30 Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed Landfors, Mattias Philip, Philge Rydén, Patrik Stenberg, Per PLoS One Research Article Genome-wide analysis of gene expression or protein binding patterns using different array or sequencing based technologies is now routinely performed to compare different populations, such as treatment and reference groups. It is often necessary to normalize the data obtained to remove technical variation introduced in the course of conducting experimental work, but standard normalization techniques are not capable of eliminating technical bias in cases where the distribution of the truly altered variables is skewed, i.e. when a large fraction of the variables are either positively or negatively affected by the treatment. However, several experiments are likely to generate such skewed distributions, including ChIP-chip experiments for the study of chromatin, gene expression experiments for the study of apoptosis, and SNP-studies of copy number variation in normal and tumour tissues. A preliminary study using spike-in array data established that the capacity of an experiment to identify altered variables and generate unbiased estimates of the fold change decreases as the fraction of altered variables and the skewness increases. We propose the following work-flow for analyzing high-dimensional experiments with regions of altered variables: (1) Pre-process raw data using one of the standard normalization techniques. (2) Investigate if the distribution of the altered variables is skewed. (3) If the distribution is not believed to be skewed, no additional normalization is needed. Otherwise, re-normalize the data using a novel HMM-assisted normalization procedure. (4) Perform downstream analysis. Here, ChIP-chip data and simulated data were used to evaluate the performance of the work-flow. It was found that skewed distributions can be detected by using the novel DSE-test (Detection of Skewed Experiments). Furthermore, applying the HMM-assisted normalization to experiments where the distribution of the truly altered variables is skewed results in considerably higher sensitivity and lower bias than can be attained using standard and invariant normalization methods. Public Library of Science 2011-11-22 /pmc/articles/PMC3222656/ /pubmed/22132175 http://dx.doi.org/10.1371/journal.pone.0027942 Text en Landfors et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Landfors, Mattias Philip, Philge Rydén, Patrik Stenberg, Per Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed
title	Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed
title_full	Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed
title_fullStr	Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed
title_full_unstemmed	Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed
title_short	Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed
title_sort	normalization of high dimensional genomics data where the distribution of the altered variables is skewed
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3222656/ https://www.ncbi.nlm.nih.gov/pubmed/22132175 http://dx.doi.org/10.1371/journal.pone.0027942
work_keys_str_mv	AT landforsmattias normalizationofhighdimensionalgenomicsdatawherethedistributionofthealteredvariablesisskewed AT philipphilge normalizationofhighdimensionalgenomicsdatawherethedistributionofthealteredvariablesisskewed AT rydenpatrik normalizationofhighdimensionalgenomicsdatawherethedistributionofthealteredvariablesisskewed AT stenbergper normalizationofhighdimensionalgenomicsdatawherethedistributionofthealteredvariablesisskewed

Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed

Ejemplares similares