Cargando…

Identifying peaks in *-seq data using shape information

BACKGROUND: Peak calling is a fundamental step in the analysis of data generated by ChIP-seq or similar techniques to acquire epigenetics information. Current peak callers are often hard to parameterise and may therefore be difficult to use for non-bioinformaticians. In this paper, we present the Ch...

Descripción completa

Detalles Bibliográficos
Autores principales: Strino, Francesco, Lappe, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4905608/
https://www.ncbi.nlm.nih.gov/pubmed/27295177
http://dx.doi.org/10.1186/s12859-016-1042-5
_version_ 1782437279314214912
author Strino, Francesco
Lappe, Michael
author_facet Strino, Francesco
Lappe, Michael
author_sort Strino, Francesco
collection PubMed
description BACKGROUND: Peak calling is a fundamental step in the analysis of data generated by ChIP-seq or similar techniques to acquire epigenetics information. Current peak callers are often hard to parameterise and may therefore be difficult to use for non-bioinformaticians. In this paper, we present the ChIP-seq analysis tool available in CLC Genomics Workbench and CLC Genomics Server (version 7.5 and up), a user-friendly peak-caller designed to be not specific to a particular *-seq protocol. RESULTS: We illustrate the advantages of a shape-based approach and describe the algorithmic principles underlying the implementation. Thanks to the generality of the idea and the fact the algorithm is able to learn the peak shape from the data, the implementation requires only minimal user input, while still being applicable to a range of *-seq protocols. Using independently validated benchmark datasets, we compare our implementation to other state-of-the-art algorithms explicitly designed to analyse ChIP-seq data and provide an evaluation in terms of receiver-operator characteristic (ROC) plots. In order to show the applicability of the method to similar *-seq protocols, we also investigate algorithmic performances on DNase-seq data. CONCLUSIONS: The results show that CLC shape-based peak caller ranks well among popular state-of-the-art peak callers while providing flexibility and ease-of-use.
format Online
Article
Text
id pubmed-4905608
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49056082016-06-14 Identifying peaks in *-seq data using shape information Strino, Francesco Lappe, Michael BMC Bioinformatics Research BACKGROUND: Peak calling is a fundamental step in the analysis of data generated by ChIP-seq or similar techniques to acquire epigenetics information. Current peak callers are often hard to parameterise and may therefore be difficult to use for non-bioinformaticians. In this paper, we present the ChIP-seq analysis tool available in CLC Genomics Workbench and CLC Genomics Server (version 7.5 and up), a user-friendly peak-caller designed to be not specific to a particular *-seq protocol. RESULTS: We illustrate the advantages of a shape-based approach and describe the algorithmic principles underlying the implementation. Thanks to the generality of the idea and the fact the algorithm is able to learn the peak shape from the data, the implementation requires only minimal user input, while still being applicable to a range of *-seq protocols. Using independently validated benchmark datasets, we compare our implementation to other state-of-the-art algorithms explicitly designed to analyse ChIP-seq data and provide an evaluation in terms of receiver-operator characteristic (ROC) plots. In order to show the applicability of the method to similar *-seq protocols, we also investigate algorithmic performances on DNase-seq data. CONCLUSIONS: The results show that CLC shape-based peak caller ranks well among popular state-of-the-art peak callers while providing flexibility and ease-of-use. BioMed Central 2016-06-06 /pmc/articles/PMC4905608/ /pubmed/27295177 http://dx.doi.org/10.1186/s12859-016-1042-5 Text en © Strino and Lappe. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Strino, Francesco
Lappe, Michael
Identifying peaks in *-seq data using shape information
title Identifying peaks in *-seq data using shape information
title_full Identifying peaks in *-seq data using shape information
title_fullStr Identifying peaks in *-seq data using shape information
title_full_unstemmed Identifying peaks in *-seq data using shape information
title_short Identifying peaks in *-seq data using shape information
title_sort identifying peaks in *-seq data using shape information
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4905608/
https://www.ncbi.nlm.nih.gov/pubmed/27295177
http://dx.doi.org/10.1186/s12859-016-1042-5
work_keys_str_mv AT strinofrancesco identifyingpeaksinseqdatausingshapeinformation
AT lappemichael identifyingpeaksinseqdatausingshapeinformation