Cargando…

Zerone: a ChIP-seq discretizer for multiple replicates with built-in quality control

Motivation: Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard method to investigate chromatin protein composition. As the number of community-available ChIP-seq profiles increases, it becomes more common to use data from different sources, which makes jo...

Descripción completa

Detalles Bibliográficos
Autores principales: Cuscó, Pol, Filion, Guillaume J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5039920/
https://www.ncbi.nlm.nih.gov/pubmed/27288492
http://dx.doi.org/10.1093/bioinformatics/btw336
_version_ 1782456149564456960
author Cuscó, Pol
Filion, Guillaume J.
author_facet Cuscó, Pol
Filion, Guillaume J.
author_sort Cuscó, Pol
collection PubMed
description Motivation: Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard method to investigate chromatin protein composition. As the number of community-available ChIP-seq profiles increases, it becomes more common to use data from different sources, which makes joint analysis challenging. Issues such as lack of reproducibility, heterogeneous quality and conflicts between replicates become evident when comparing datasets, especially when they are produced by different laboratories. Results: Here, we present Zerone, a ChIP-seq discretizer with built-in quality control. Zerone is powered by a Hidden Markov Model with zero-inflated negative multinomial emissions, which allows it to merge several replicates into a single discretized profile. To identify low quality or irreproducible data, we trained a Support Vector Machine and integrated it as part of the discretization process. The result is a classifier reaching 95% accuracy in detecting low quality profiles. We also introduce a graphical representation to compare discretization quality and we show that Zerone achieves outstanding accuracy. Finally, on current hardware, Zerone discretizes a ChIP-seq experiment on mammalian genomes in about 5 min using less than 700 MB of memory. Availability and Implementation: Zerone is available as a command line tool and as an R package. The C source code and R scripts can be downloaded from https://github.com/nanakiksc/zerone. The information to reproduce the benchmark and the figures is stored in a public Docker image that can be downloaded from https://hub.docker.com/r/nanakiksc/zerone/. Contact: guillaume.filion@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5039920
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-50399202016-09-29 Zerone: a ChIP-seq discretizer for multiple replicates with built-in quality control Cuscó, Pol Filion, Guillaume J. Bioinformatics Original Papers Motivation: Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard method to investigate chromatin protein composition. As the number of community-available ChIP-seq profiles increases, it becomes more common to use data from different sources, which makes joint analysis challenging. Issues such as lack of reproducibility, heterogeneous quality and conflicts between replicates become evident when comparing datasets, especially when they are produced by different laboratories. Results: Here, we present Zerone, a ChIP-seq discretizer with built-in quality control. Zerone is powered by a Hidden Markov Model with zero-inflated negative multinomial emissions, which allows it to merge several replicates into a single discretized profile. To identify low quality or irreproducible data, we trained a Support Vector Machine and integrated it as part of the discretization process. The result is a classifier reaching 95% accuracy in detecting low quality profiles. We also introduce a graphical representation to compare discretization quality and we show that Zerone achieves outstanding accuracy. Finally, on current hardware, Zerone discretizes a ChIP-seq experiment on mammalian genomes in about 5 min using less than 700 MB of memory. Availability and Implementation: Zerone is available as a command line tool and as an R package. The C source code and R scripts can be downloaded from https://github.com/nanakiksc/zerone. The information to reproduce the benchmark and the figures is stored in a public Docker image that can be downloaded from https://hub.docker.com/r/nanakiksc/zerone/. Contact: guillaume.filion@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-10-01 2016-06-10 /pmc/articles/PMC5039920/ /pubmed/27288492 http://dx.doi.org/10.1093/bioinformatics/btw336 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Cuscó, Pol
Filion, Guillaume J.
Zerone: a ChIP-seq discretizer for multiple replicates with built-in quality control
title Zerone: a ChIP-seq discretizer for multiple replicates with built-in quality control
title_full Zerone: a ChIP-seq discretizer for multiple replicates with built-in quality control
title_fullStr Zerone: a ChIP-seq discretizer for multiple replicates with built-in quality control
title_full_unstemmed Zerone: a ChIP-seq discretizer for multiple replicates with built-in quality control
title_short Zerone: a ChIP-seq discretizer for multiple replicates with built-in quality control
title_sort zerone: a chip-seq discretizer for multiple replicates with built-in quality control
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5039920/
https://www.ncbi.nlm.nih.gov/pubmed/27288492
http://dx.doi.org/10.1093/bioinformatics/btw336
work_keys_str_mv AT cuscopol zeroneachipseqdiscretizerformultiplereplicateswithbuiltinqualitycontrol
AT filionguillaumej zeroneachipseqdiscretizerformultiplereplicateswithbuiltinqualitycontrol