Cargando…

Measuring the reproducibility and quality of Hi-C data

BACKGROUND: Hi-C is currently the most widely used assay to investigate the 3D organization of the genome and to study its role in gene regulation, DNA replication, and disease. However, Hi-C experiments are costly to perform and involve multiple complex experimental steps; thus, accurate methods fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Yardımcı, Galip Gürkan, Ozadam, Hakan, Sauria, Michael E. G., Ursu, Oana, Yan, Koon-Kiu, Yang, Tao, Chakraborty, Abhijit, Kaul, Arya, Lajoie, Bryan R., Song, Fan, Zhan, Ye, Ay, Ferhat, Gerstein, Mark, Kundaje, Anshul, Li, Qunhua, Taylor, James, Yue, Feng, Dekker, Job, Noble, William S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6423771/
https://www.ncbi.nlm.nih.gov/pubmed/30890172
http://dx.doi.org/10.1186/s13059-019-1658-7
_version_ 1783404582964035584
author Yardımcı, Galip Gürkan
Ozadam, Hakan
Sauria, Michael E. G.
Ursu, Oana
Yan, Koon-Kiu
Yang, Tao
Chakraborty, Abhijit
Kaul, Arya
Lajoie, Bryan R.
Song, Fan
Zhan, Ye
Ay, Ferhat
Gerstein, Mark
Kundaje, Anshul
Li, Qunhua
Taylor, James
Yue, Feng
Dekker, Job
Noble, William S.
author_facet Yardımcı, Galip Gürkan
Ozadam, Hakan
Sauria, Michael E. G.
Ursu, Oana
Yan, Koon-Kiu
Yang, Tao
Chakraborty, Abhijit
Kaul, Arya
Lajoie, Bryan R.
Song, Fan
Zhan, Ye
Ay, Ferhat
Gerstein, Mark
Kundaje, Anshul
Li, Qunhua
Taylor, James
Yue, Feng
Dekker, Job
Noble, William S.
author_sort Yardımcı, Galip Gürkan
collection PubMed
description BACKGROUND: Hi-C is currently the most widely used assay to investigate the 3D organization of the genome and to study its role in gene regulation, DNA replication, and disease. However, Hi-C experiments are costly to perform and involve multiple complex experimental steps; thus, accurate methods for measuring the quality and reproducibility of Hi-C data are essential to determine whether the output should be used further in a study. RESULTS: Using real and simulated data, we profile the performance of several recently proposed methods for assessing reproducibility of population Hi-C data, including HiCRep, GenomeDISCO, HiC-Spector, and QuASAR-Rep. By explicitly controlling noise and sparsity through simulations, we demonstrate the deficiencies of performing simple correlation analysis on pairs of matrices, and we show that methods developed specifically for Hi-C data produce better measures of reproducibility. We also show how to use established measures, such as the ratio of intra- to interchromosomal interactions, and novel ones, such as QuASAR-QC, to identify low-quality experiments. CONCLUSIONS: In this work, we assess reproducibility and quality measures by varying sequencing depth, resolution and noise levels in Hi-C data from 13 cell lines, with two biological replicates each, as well as 176 simulated matrices. Through this extensive validation and benchmarking of Hi-C data, we describe best practices for reproducibility and quality assessment of Hi-C experiments. We make all software publicly available at http://github.com/kundajelab/3DChromatin_ReplicateQC to facilitate adoption in the community. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1658-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6423771
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64237712019-03-28 Measuring the reproducibility and quality of Hi-C data Yardımcı, Galip Gürkan Ozadam, Hakan Sauria, Michael E. G. Ursu, Oana Yan, Koon-Kiu Yang, Tao Chakraborty, Abhijit Kaul, Arya Lajoie, Bryan R. Song, Fan Zhan, Ye Ay, Ferhat Gerstein, Mark Kundaje, Anshul Li, Qunhua Taylor, James Yue, Feng Dekker, Job Noble, William S. Genome Biol Research BACKGROUND: Hi-C is currently the most widely used assay to investigate the 3D organization of the genome and to study its role in gene regulation, DNA replication, and disease. However, Hi-C experiments are costly to perform and involve multiple complex experimental steps; thus, accurate methods for measuring the quality and reproducibility of Hi-C data are essential to determine whether the output should be used further in a study. RESULTS: Using real and simulated data, we profile the performance of several recently proposed methods for assessing reproducibility of population Hi-C data, including HiCRep, GenomeDISCO, HiC-Spector, and QuASAR-Rep. By explicitly controlling noise and sparsity through simulations, we demonstrate the deficiencies of performing simple correlation analysis on pairs of matrices, and we show that methods developed specifically for Hi-C data produce better measures of reproducibility. We also show how to use established measures, such as the ratio of intra- to interchromosomal interactions, and novel ones, such as QuASAR-QC, to identify low-quality experiments. CONCLUSIONS: In this work, we assess reproducibility and quality measures by varying sequencing depth, resolution and noise levels in Hi-C data from 13 cell lines, with two biological replicates each, as well as 176 simulated matrices. Through this extensive validation and benchmarking of Hi-C data, we describe best practices for reproducibility and quality assessment of Hi-C experiments. We make all software publicly available at http://github.com/kundajelab/3DChromatin_ReplicateQC to facilitate adoption in the community. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1658-7) contains supplementary material, which is available to authorized users. BioMed Central 2019-03-19 /pmc/articles/PMC6423771/ /pubmed/30890172 http://dx.doi.org/10.1186/s13059-019-1658-7 Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Yardımcı, Galip Gürkan
Ozadam, Hakan
Sauria, Michael E. G.
Ursu, Oana
Yan, Koon-Kiu
Yang, Tao
Chakraborty, Abhijit
Kaul, Arya
Lajoie, Bryan R.
Song, Fan
Zhan, Ye
Ay, Ferhat
Gerstein, Mark
Kundaje, Anshul
Li, Qunhua
Taylor, James
Yue, Feng
Dekker, Job
Noble, William S.
Measuring the reproducibility and quality of Hi-C data
title Measuring the reproducibility and quality of Hi-C data
title_full Measuring the reproducibility and quality of Hi-C data
title_fullStr Measuring the reproducibility and quality of Hi-C data
title_full_unstemmed Measuring the reproducibility and quality of Hi-C data
title_short Measuring the reproducibility and quality of Hi-C data
title_sort measuring the reproducibility and quality of hi-c data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6423771/
https://www.ncbi.nlm.nih.gov/pubmed/30890172
http://dx.doi.org/10.1186/s13059-019-1658-7
work_keys_str_mv AT yardımcıgalipgurkan measuringthereproducibilityandqualityofhicdata
AT ozadamhakan measuringthereproducibilityandqualityofhicdata
AT sauriamichaeleg measuringthereproducibilityandqualityofhicdata
AT ursuoana measuringthereproducibilityandqualityofhicdata
AT yankoonkiu measuringthereproducibilityandqualityofhicdata
AT yangtao measuringthereproducibilityandqualityofhicdata
AT chakrabortyabhijit measuringthereproducibilityandqualityofhicdata
AT kaularya measuringthereproducibilityandqualityofhicdata
AT lajoiebryanr measuringthereproducibilityandqualityofhicdata
AT songfan measuringthereproducibilityandqualityofhicdata
AT zhanye measuringthereproducibilityandqualityofhicdata
AT ayferhat measuringthereproducibilityandqualityofhicdata
AT gersteinmark measuringthereproducibilityandqualityofhicdata
AT kundajeanshul measuringthereproducibilityandqualityofhicdata
AT liqunhua measuringthereproducibilityandqualityofhicdata
AT taylorjames measuringthereproducibilityandqualityofhicdata
AT yuefeng measuringthereproducibilityandqualityofhicdata
AT dekkerjob measuringthereproducibilityandqualityofhicdata
AT noblewilliams measuringthereproducibilityandqualityofhicdata