Cargando…

A statistical approach for inferring the 3D structure of the genome

Motivation: Recent technological advances allow the measurement, in a single Hi-C experiment, of the frequencies of physical contacts among pairs of genomic loci at a genome-wide scale. The next challenge is to infer, from the resulting DNA–DNA contact maps, accurate 3D models of how chromosomes fol...

Descripción completa

Detalles Bibliográficos
Autores principales: Varoquaux, Nelle, Ay, Ferhat, Noble, William Stafford, Vert, Jean-Philippe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4229903/
https://www.ncbi.nlm.nih.gov/pubmed/24931992
http://dx.doi.org/10.1093/bioinformatics/btu268
_version_ 1782344187893514240
author Varoquaux, Nelle
Ay, Ferhat
Noble, William Stafford
Vert, Jean-Philippe
author_facet Varoquaux, Nelle
Ay, Ferhat
Noble, William Stafford
Vert, Jean-Philippe
author_sort Varoquaux, Nelle
collection PubMed
description Motivation: Recent technological advances allow the measurement, in a single Hi-C experiment, of the frequencies of physical contacts among pairs of genomic loci at a genome-wide scale. The next challenge is to infer, from the resulting DNA–DNA contact maps, accurate 3D models of how chromosomes fold and fit into the nucleus. Many existing inference methods rely on multidimensional scaling (MDS), in which the pairwise distances of the inferred model are optimized to resemble pairwise distances derived directly from the contact counts. These approaches, however, often optimize a heuristic objective function and require strong assumptions about the biophysics of DNA to transform interaction frequencies to spatial distance, and thereby may lead to incorrect structure reconstruction. Methods: We propose a novel approach to infer a consensus 3D structure of a genome from Hi-C data. The method incorporates a statistical model of the contact counts, assuming that the counts between two loci follow a Poisson distribution whose intensity decreases with the physical distances between the loci. The method can automatically adjust the transfer function relating the spatial distance to the Poisson intensity and infer a genome structure that best explains the observed data. Results: We compare two variants of our Poisson method, with or without optimization of the transfer function, to four different MDS-based algorithms—two metric MDS methods using different stress functions, a non-metric version of MDS and ChromSDE, a recently described, advanced MDS method—on a wide range of simulated datasets. We demonstrate that the Poisson models reconstruct better structures than all MDS-based methods, particularly at low coverage and high resolution, and we highlight the importance of optimizing the transfer function. On publicly available Hi-C data from mouse embryonic stem cells, we show that the Poisson methods lead to more reproducible structures than MDS-based methods when we use data generated using different restriction enzymes, and when we reconstruct structures at different resolutions. Availability and implementation: A Python implementation of the proposed method is available at http://cbio.ensmp.fr/pastis. Contact: william-noble@uw.edu or jean-philippe.vert@mines.org
format Online
Article
Text
id pubmed-4229903
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-42299032014-11-13 A statistical approach for inferring the 3D structure of the genome Varoquaux, Nelle Ay, Ferhat Noble, William Stafford Vert, Jean-Philippe Bioinformatics Ismb 2014 Proceedings Papers Committee Motivation: Recent technological advances allow the measurement, in a single Hi-C experiment, of the frequencies of physical contacts among pairs of genomic loci at a genome-wide scale. The next challenge is to infer, from the resulting DNA–DNA contact maps, accurate 3D models of how chromosomes fold and fit into the nucleus. Many existing inference methods rely on multidimensional scaling (MDS), in which the pairwise distances of the inferred model are optimized to resemble pairwise distances derived directly from the contact counts. These approaches, however, often optimize a heuristic objective function and require strong assumptions about the biophysics of DNA to transform interaction frequencies to spatial distance, and thereby may lead to incorrect structure reconstruction. Methods: We propose a novel approach to infer a consensus 3D structure of a genome from Hi-C data. The method incorporates a statistical model of the contact counts, assuming that the counts between two loci follow a Poisson distribution whose intensity decreases with the physical distances between the loci. The method can automatically adjust the transfer function relating the spatial distance to the Poisson intensity and infer a genome structure that best explains the observed data. Results: We compare two variants of our Poisson method, with or without optimization of the transfer function, to four different MDS-based algorithms—two metric MDS methods using different stress functions, a non-metric version of MDS and ChromSDE, a recently described, advanced MDS method—on a wide range of simulated datasets. We demonstrate that the Poisson models reconstruct better structures than all MDS-based methods, particularly at low coverage and high resolution, and we highlight the importance of optimizing the transfer function. On publicly available Hi-C data from mouse embryonic stem cells, we show that the Poisson methods lead to more reproducible structures than MDS-based methods when we use data generated using different restriction enzymes, and when we reconstruct structures at different resolutions. Availability and implementation: A Python implementation of the proposed method is available at http://cbio.ensmp.fr/pastis. Contact: william-noble@uw.edu or jean-philippe.vert@mines.org Oxford University Press 2014-06-15 2014-06-11 /pmc/articles/PMC4229903/ /pubmed/24931992 http://dx.doi.org/10.1093/bioinformatics/btu268 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb 2014 Proceedings Papers Committee
Varoquaux, Nelle
Ay, Ferhat
Noble, William Stafford
Vert, Jean-Philippe
A statistical approach for inferring the 3D structure of the genome
title A statistical approach for inferring the 3D structure of the genome
title_full A statistical approach for inferring the 3D structure of the genome
title_fullStr A statistical approach for inferring the 3D structure of the genome
title_full_unstemmed A statistical approach for inferring the 3D structure of the genome
title_short A statistical approach for inferring the 3D structure of the genome
title_sort statistical approach for inferring the 3d structure of the genome
topic Ismb 2014 Proceedings Papers Committee
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4229903/
https://www.ncbi.nlm.nih.gov/pubmed/24931992
http://dx.doi.org/10.1093/bioinformatics/btu268
work_keys_str_mv AT varoquauxnelle astatisticalapproachforinferringthe3dstructureofthegenome
AT ayferhat astatisticalapproachforinferringthe3dstructureofthegenome
AT noblewilliamstafford astatisticalapproachforinferringthe3dstructureofthegenome
AT vertjeanphilippe astatisticalapproachforinferringthe3dstructureofthegenome
AT varoquauxnelle statisticalapproachforinferringthe3dstructureofthegenome
AT ayferhat statisticalapproachforinferringthe3dstructureofthegenome
AT noblewilliamstafford statisticalapproachforinferringthe3dstructureofthegenome
AT vertjeanphilippe statisticalapproachforinferringthe3dstructureofthegenome