Cargando…

Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed

Genome-wide analysis of gene expression or protein binding patterns using different array or sequencing based technologies is now routinely performed to compare different populations, such as treatment and reference groups. It is often necessary to normalize the data obtained to remove technical var...

Descripción completa

Detalles Bibliográficos
Autores principales: Landfors, Mattias, Philip, Philge, Rydén, Patrik, Stenberg, Per
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3222656/
https://www.ncbi.nlm.nih.gov/pubmed/22132175
http://dx.doi.org/10.1371/journal.pone.0027942
_version_ 1782217214596743168
author Landfors, Mattias
Philip, Philge
Rydén, Patrik
Stenberg, Per
author_facet Landfors, Mattias
Philip, Philge
Rydén, Patrik
Stenberg, Per
author_sort Landfors, Mattias
collection PubMed
description Genome-wide analysis of gene expression or protein binding patterns using different array or sequencing based technologies is now routinely performed to compare different populations, such as treatment and reference groups. It is often necessary to normalize the data obtained to remove technical variation introduced in the course of conducting experimental work, but standard normalization techniques are not capable of eliminating technical bias in cases where the distribution of the truly altered variables is skewed, i.e. when a large fraction of the variables are either positively or negatively affected by the treatment. However, several experiments are likely to generate such skewed distributions, including ChIP-chip experiments for the study of chromatin, gene expression experiments for the study of apoptosis, and SNP-studies of copy number variation in normal and tumour tissues. A preliminary study using spike-in array data established that the capacity of an experiment to identify altered variables and generate unbiased estimates of the fold change decreases as the fraction of altered variables and the skewness increases. We propose the following work-flow for analyzing high-dimensional experiments with regions of altered variables: (1) Pre-process raw data using one of the standard normalization techniques. (2) Investigate if the distribution of the altered variables is skewed. (3) If the distribution is not believed to be skewed, no additional normalization is needed. Otherwise, re-normalize the data using a novel HMM-assisted normalization procedure. (4) Perform downstream analysis. Here, ChIP-chip data and simulated data were used to evaluate the performance of the work-flow. It was found that skewed distributions can be detected by using the novel DSE-test (Detection of Skewed Experiments). Furthermore, applying the HMM-assisted normalization to experiments where the distribution of the truly altered variables is skewed results in considerably higher sensitivity and lower bias than can be attained using standard and invariant normalization methods.
format Online
Article
Text
id pubmed-3222656
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-32226562011-11-30 Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed Landfors, Mattias Philip, Philge Rydén, Patrik Stenberg, Per PLoS One Research Article Genome-wide analysis of gene expression or protein binding patterns using different array or sequencing based technologies is now routinely performed to compare different populations, such as treatment and reference groups. It is often necessary to normalize the data obtained to remove technical variation introduced in the course of conducting experimental work, but standard normalization techniques are not capable of eliminating technical bias in cases where the distribution of the truly altered variables is skewed, i.e. when a large fraction of the variables are either positively or negatively affected by the treatment. However, several experiments are likely to generate such skewed distributions, including ChIP-chip experiments for the study of chromatin, gene expression experiments for the study of apoptosis, and SNP-studies of copy number variation in normal and tumour tissues. A preliminary study using spike-in array data established that the capacity of an experiment to identify altered variables and generate unbiased estimates of the fold change decreases as the fraction of altered variables and the skewness increases. We propose the following work-flow for analyzing high-dimensional experiments with regions of altered variables: (1) Pre-process raw data using one of the standard normalization techniques. (2) Investigate if the distribution of the altered variables is skewed. (3) If the distribution is not believed to be skewed, no additional normalization is needed. Otherwise, re-normalize the data using a novel HMM-assisted normalization procedure. (4) Perform downstream analysis. Here, ChIP-chip data and simulated data were used to evaluate the performance of the work-flow. It was found that skewed distributions can be detected by using the novel DSE-test (Detection of Skewed Experiments). Furthermore, applying the HMM-assisted normalization to experiments where the distribution of the truly altered variables is skewed results in considerably higher sensitivity and lower bias than can be attained using standard and invariant normalization methods. Public Library of Science 2011-11-22 /pmc/articles/PMC3222656/ /pubmed/22132175 http://dx.doi.org/10.1371/journal.pone.0027942 Text en Landfors et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Landfors, Mattias
Philip, Philge
Rydén, Patrik
Stenberg, Per
Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed
title Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed
title_full Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed
title_fullStr Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed
title_full_unstemmed Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed
title_short Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed
title_sort normalization of high dimensional genomics data where the distribution of the altered variables is skewed
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3222656/
https://www.ncbi.nlm.nih.gov/pubmed/22132175
http://dx.doi.org/10.1371/journal.pone.0027942
work_keys_str_mv AT landforsmattias normalizationofhighdimensionalgenomicsdatawherethedistributionofthealteredvariablesisskewed
AT philipphilge normalizationofhighdimensionalgenomicsdatawherethedistributionofthealteredvariablesisskewed
AT rydenpatrik normalizationofhighdimensionalgenomicsdatawherethedistributionofthealteredvariablesisskewed
AT stenbergper normalizationofhighdimensionalgenomicsdatawherethedistributionofthealteredvariablesisskewed