Cargando…

qc3C: Reference-free quality control for Hi-C sequencing data

Hi-C is a sample preparation method that enables high-throughput sequencing to capture genome-wide spatial interactions between DNA molecules. The technique has been successfully applied to solve challenging problems such as 3D structural analysis of chromatin, scaffolding of large genome assemblies...

Descripción completa

Detalles Bibliográficos
Autores principales:	DeMaere, Matthew Z., Darling, Aaron E.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8530316/ https://www.ncbi.nlm.nih.gov/pubmed/34634030 http://dx.doi.org/10.1371/journal.pcbi.1008839

_version_	1784586645934178304
author	DeMaere, Matthew Z. Darling, Aaron E.
author_facet	DeMaere, Matthew Z. Darling, Aaron E.
author_sort	DeMaere, Matthew Z.
collection	PubMed
description	Hi-C is a sample preparation method that enables high-throughput sequencing to capture genome-wide spatial interactions between DNA molecules. The technique has been successfully applied to solve challenging problems such as 3D structural analysis of chromatin, scaffolding of large genome assemblies and more recently the accurate resolution of metagenome-assembled genomes (MAGs). Despite continued refinements, however, preparing a Hi-C library remains a complex laboratory protocol. To avoid costly failures and maximise the odds of successful outcomes, diligent quality management is recommended. Current wet-lab methods provide only a crude assay of Hi-C library quality, while key post-sequencing quality indicators used have—thus far—relied upon reference-based read-mapping. When a reference is accessible, this reliance introduces a concern for quality, where an incomplete or inexact reference skews the resulting quality indicators. We propose a new, reference-free approach that infers the total fraction of read-pairs that are a product of proximity ligation. This quantification of Hi-C library quality requires only a modest amount of sequencing data and is independent of other application-specific criteria. The algorithm builds upon the observation that proximity ligation events are likely to create k-mers that would not naturally occur in the sample. Our software tool (qc3C) is to our knowledge the first to implement a reference-free Hi-C QC tool, and also provides reference-based QC, enabling Hi-C to be more easily applied to non-model organisms and environmental samples. We characterise the accuracy of the new algorithm on simulated and real datasets and compare it to reference-based methods.
format	Online Article Text
id	pubmed-8530316
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-85303162021-10-22 qc3C: Reference-free quality control for Hi-C sequencing data DeMaere, Matthew Z. Darling, Aaron E. PLoS Comput Biol Research Article Hi-C is a sample preparation method that enables high-throughput sequencing to capture genome-wide spatial interactions between DNA molecules. The technique has been successfully applied to solve challenging problems such as 3D structural analysis of chromatin, scaffolding of large genome assemblies and more recently the accurate resolution of metagenome-assembled genomes (MAGs). Despite continued refinements, however, preparing a Hi-C library remains a complex laboratory protocol. To avoid costly failures and maximise the odds of successful outcomes, diligent quality management is recommended. Current wet-lab methods provide only a crude assay of Hi-C library quality, while key post-sequencing quality indicators used have—thus far—relied upon reference-based read-mapping. When a reference is accessible, this reliance introduces a concern for quality, where an incomplete or inexact reference skews the resulting quality indicators. We propose a new, reference-free approach that infers the total fraction of read-pairs that are a product of proximity ligation. This quantification of Hi-C library quality requires only a modest amount of sequencing data and is independent of other application-specific criteria. The algorithm builds upon the observation that proximity ligation events are likely to create k-mers that would not naturally occur in the sample. Our software tool (qc3C) is to our knowledge the first to implement a reference-free Hi-C QC tool, and also provides reference-based QC, enabling Hi-C to be more easily applied to non-model organisms and environmental samples. We characterise the accuracy of the new algorithm on simulated and real datasets and compare it to reference-based methods. Public Library of Science 2021-10-11 /pmc/articles/PMC8530316/ /pubmed/34634030 http://dx.doi.org/10.1371/journal.pcbi.1008839 Text en © 2021 DeMaere, Darling https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article DeMaere, Matthew Z. Darling, Aaron E. qc3C: Reference-free quality control for Hi-C sequencing data
title	qc3C: Reference-free quality control for Hi-C sequencing data
title_full	qc3C: Reference-free quality control for Hi-C sequencing data
title_fullStr	qc3C: Reference-free quality control for Hi-C sequencing data
title_full_unstemmed	qc3C: Reference-free quality control for Hi-C sequencing data
title_short	qc3C: Reference-free quality control for Hi-C sequencing data
title_sort	qc3c: reference-free quality control for hi-c sequencing data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8530316/ https://www.ncbi.nlm.nih.gov/pubmed/34634030 http://dx.doi.org/10.1371/journal.pcbi.1008839
work_keys_str_mv	AT demaerematthewz qc3creferencefreequalitycontrolforhicsequencingdata AT darlingaarone qc3creferencefreequalitycontrolforhicsequencingdata

qc3C: Reference-free quality control for Hi-C sequencing data

Ejemplares similares