Cargando…
Identifying peaks in *-seq data using shape information
BACKGROUND: Peak calling is a fundamental step in the analysis of data generated by ChIP-seq or similar techniques to acquire epigenetics information. Current peak callers are often hard to parameterise and may therefore be difficult to use for non-bioinformaticians. In this paper, we present the Ch...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4905608/ https://www.ncbi.nlm.nih.gov/pubmed/27295177 http://dx.doi.org/10.1186/s12859-016-1042-5 |
_version_ | 1782437279314214912 |
---|---|
author | Strino, Francesco Lappe, Michael |
author_facet | Strino, Francesco Lappe, Michael |
author_sort | Strino, Francesco |
collection | PubMed |
description | BACKGROUND: Peak calling is a fundamental step in the analysis of data generated by ChIP-seq or similar techniques to acquire epigenetics information. Current peak callers are often hard to parameterise and may therefore be difficult to use for non-bioinformaticians. In this paper, we present the ChIP-seq analysis tool available in CLC Genomics Workbench and CLC Genomics Server (version 7.5 and up), a user-friendly peak-caller designed to be not specific to a particular *-seq protocol. RESULTS: We illustrate the advantages of a shape-based approach and describe the algorithmic principles underlying the implementation. Thanks to the generality of the idea and the fact the algorithm is able to learn the peak shape from the data, the implementation requires only minimal user input, while still being applicable to a range of *-seq protocols. Using independently validated benchmark datasets, we compare our implementation to other state-of-the-art algorithms explicitly designed to analyse ChIP-seq data and provide an evaluation in terms of receiver-operator characteristic (ROC) plots. In order to show the applicability of the method to similar *-seq protocols, we also investigate algorithmic performances on DNase-seq data. CONCLUSIONS: The results show that CLC shape-based peak caller ranks well among popular state-of-the-art peak callers while providing flexibility and ease-of-use. |
format | Online Article Text |
id | pubmed-4905608 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-49056082016-06-14 Identifying peaks in *-seq data using shape information Strino, Francesco Lappe, Michael BMC Bioinformatics Research BACKGROUND: Peak calling is a fundamental step in the analysis of data generated by ChIP-seq or similar techniques to acquire epigenetics information. Current peak callers are often hard to parameterise and may therefore be difficult to use for non-bioinformaticians. In this paper, we present the ChIP-seq analysis tool available in CLC Genomics Workbench and CLC Genomics Server (version 7.5 and up), a user-friendly peak-caller designed to be not specific to a particular *-seq protocol. RESULTS: We illustrate the advantages of a shape-based approach and describe the algorithmic principles underlying the implementation. Thanks to the generality of the idea and the fact the algorithm is able to learn the peak shape from the data, the implementation requires only minimal user input, while still being applicable to a range of *-seq protocols. Using independently validated benchmark datasets, we compare our implementation to other state-of-the-art algorithms explicitly designed to analyse ChIP-seq data and provide an evaluation in terms of receiver-operator characteristic (ROC) plots. In order to show the applicability of the method to similar *-seq protocols, we also investigate algorithmic performances on DNase-seq data. CONCLUSIONS: The results show that CLC shape-based peak caller ranks well among popular state-of-the-art peak callers while providing flexibility and ease-of-use. BioMed Central 2016-06-06 /pmc/articles/PMC4905608/ /pubmed/27295177 http://dx.doi.org/10.1186/s12859-016-1042-5 Text en © Strino and Lappe. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Strino, Francesco Lappe, Michael Identifying peaks in *-seq data using shape information |
title | Identifying peaks in *-seq data using shape information |
title_full | Identifying peaks in *-seq data using shape information |
title_fullStr | Identifying peaks in *-seq data using shape information |
title_full_unstemmed | Identifying peaks in *-seq data using shape information |
title_short | Identifying peaks in *-seq data using shape information |
title_sort | identifying peaks in *-seq data using shape information |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4905608/ https://www.ncbi.nlm.nih.gov/pubmed/27295177 http://dx.doi.org/10.1186/s12859-016-1042-5 |
work_keys_str_mv | AT strinofrancesco identifyingpeaksinseqdatausingshapeinformation AT lappemichael identifyingpeaksinseqdatausingshapeinformation |