Cargando…
Effective normalization for copy number variation in Hi-C data
BACKGROUND: Normalization is essential to ensure accurate analysis and proper interpretation of sequencing data, and chromosome conformation capture data such as Hi-C have particular challenges. Although several methods have been proposed, the most widely used type of normalization of Hi-C data usua...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6127909/ https://www.ncbi.nlm.nih.gov/pubmed/30189838 http://dx.doi.org/10.1186/s12859-018-2256-5 |
_version_ | 1783353554410405888 |
---|---|
author | Servant, Nicolas Varoquaux, Nelle Heard, Edith Barillot, Emmanuel Vert, Jean-Philippe |
author_facet | Servant, Nicolas Varoquaux, Nelle Heard, Edith Barillot, Emmanuel Vert, Jean-Philippe |
author_sort | Servant, Nicolas |
collection | PubMed |
description | BACKGROUND: Normalization is essential to ensure accurate analysis and proper interpretation of sequencing data, and chromosome conformation capture data such as Hi-C have particular challenges. Although several methods have been proposed, the most widely used type of normalization of Hi-C data usually casts estimation of unwanted effects as a matrix balancing problem, relying on the assumption that all genomic regions interact equally with each other. RESULTS: In order to explore the effect of copy-number variations on Hi-C data normalization, we first propose a simulation model that predict the effects of large copy-number changes on a diploid Hi-C contact map. We then show that the standard approaches relying on equal visibility fail to correct for unwanted effects in the presence of copy-number variations. We thus propose a simple extension to matrix balancing methods that model these effects. Our approach can either retain the copy-number variation effects (LOIC) or remove them (CAIC). We show that this leads to better downstream analysis of the three-dimensional organization of rearranged genomes. CONCLUSIONS: Taken together, our results highlight the importance of using dedicated methods for the analysis of Hi-C cancer data. Both CAIC and LOIC methods perform well on simulated and real Hi-C data sets, each fulfilling different needs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2256-5) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6127909 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-61279092018-09-10 Effective normalization for copy number variation in Hi-C data Servant, Nicolas Varoquaux, Nelle Heard, Edith Barillot, Emmanuel Vert, Jean-Philippe BMC Bioinformatics Methodology Article BACKGROUND: Normalization is essential to ensure accurate analysis and proper interpretation of sequencing data, and chromosome conformation capture data such as Hi-C have particular challenges. Although several methods have been proposed, the most widely used type of normalization of Hi-C data usually casts estimation of unwanted effects as a matrix balancing problem, relying on the assumption that all genomic regions interact equally with each other. RESULTS: In order to explore the effect of copy-number variations on Hi-C data normalization, we first propose a simulation model that predict the effects of large copy-number changes on a diploid Hi-C contact map. We then show that the standard approaches relying on equal visibility fail to correct for unwanted effects in the presence of copy-number variations. We thus propose a simple extension to matrix balancing methods that model these effects. Our approach can either retain the copy-number variation effects (LOIC) or remove them (CAIC). We show that this leads to better downstream analysis of the three-dimensional organization of rearranged genomes. CONCLUSIONS: Taken together, our results highlight the importance of using dedicated methods for the analysis of Hi-C cancer data. Both CAIC and LOIC methods perform well on simulated and real Hi-C data sets, each fulfilling different needs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2256-5) contains supplementary material, which is available to authorized users. BioMed Central 2018-09-06 /pmc/articles/PMC6127909/ /pubmed/30189838 http://dx.doi.org/10.1186/s12859-018-2256-5 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Servant, Nicolas Varoquaux, Nelle Heard, Edith Barillot, Emmanuel Vert, Jean-Philippe Effective normalization for copy number variation in Hi-C data |
title | Effective normalization for copy number variation in Hi-C data |
title_full | Effective normalization for copy number variation in Hi-C data |
title_fullStr | Effective normalization for copy number variation in Hi-C data |
title_full_unstemmed | Effective normalization for copy number variation in Hi-C data |
title_short | Effective normalization for copy number variation in Hi-C data |
title_sort | effective normalization for copy number variation in hi-c data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6127909/ https://www.ncbi.nlm.nih.gov/pubmed/30189838 http://dx.doi.org/10.1186/s12859-018-2256-5 |
work_keys_str_mv | AT servantnicolas effectivenormalizationforcopynumbervariationinhicdata AT varoquauxnelle effectivenormalizationforcopynumbervariationinhicdata AT heardedith effectivenormalizationforcopynumbervariationinhicdata AT barillotemmanuel effectivenormalizationforcopynumbervariationinhicdata AT vertjeanphilippe effectivenormalizationforcopynumbervariationinhicdata |