Cargando…

Shape-based peak identification for ChIP-Seq

BACKGROUND: The identification of binding targets for proteins using ChIP-Seq has gained popularity as an alternative to ChIP-chip. Sequencing can, in principle, eliminate artifacts associated with microarrays, and cheap sequencing offers the ability to sequence deeply and obtain a comprehensive sur...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hower, Valerie, Evans, Steven N, Pachter, Lior
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3032669/ https://www.ncbi.nlm.nih.gov/pubmed/21226895 http://dx.doi.org/10.1186/1471-2105-12-15

_version_	1782197478675709952
author	Hower, Valerie Evans, Steven N Pachter, Lior
author_facet	Hower, Valerie Evans, Steven N Pachter, Lior
author_sort	Hower, Valerie
collection	PubMed
description	BACKGROUND: The identification of binding targets for proteins using ChIP-Seq has gained popularity as an alternative to ChIP-chip. Sequencing can, in principle, eliminate artifacts associated with microarrays, and cheap sequencing offers the ability to sequence deeply and obtain a comprehensive survey of binding. A number of algorithms have been developed to call "peaks" representing bound regions from mapped reads. Most current algorithms incorporate multiple heuristics, and despite much work it remains difficult to accurately determine individual peaks corresponding to distinct binding events. RESULTS: Our method for identifying statistically significant peaks from read coverage is inspired by the notion of persistence in topological data analysis and provides a non-parametric approach that is statistically sound and robust to noise in experiments. Specifically, our method reduces the peak calling problem to the study of tree-based statistics derived from the data. We validate our approach using previously published data and show that it can discover previously missed regions. CONCLUSIONS: The difficulty in accurately calling peaks for ChIP-Seq data is partly due to the difficulty in defining peaks, and we demonstrate a novel method that improves on the accuracy of previous methods in resolving peaks. Our introduction of a robust statistical test based on ideas from topological data analysis is also novel. Our methods are implemented in a program called T-PIC (Tree shape Peak Identification for ChIP-Seq) is available at http://bio.math.berkeley.edu/tpic/.
format	Text
id	pubmed-3032669
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-30326692011-02-04 Shape-based peak identification for ChIP-Seq Hower, Valerie Evans, Steven N Pachter, Lior BMC Bioinformatics Methodology Article BACKGROUND: The identification of binding targets for proteins using ChIP-Seq has gained popularity as an alternative to ChIP-chip. Sequencing can, in principle, eliminate artifacts associated with microarrays, and cheap sequencing offers the ability to sequence deeply and obtain a comprehensive survey of binding. A number of algorithms have been developed to call "peaks" representing bound regions from mapped reads. Most current algorithms incorporate multiple heuristics, and despite much work it remains difficult to accurately determine individual peaks corresponding to distinct binding events. RESULTS: Our method for identifying statistically significant peaks from read coverage is inspired by the notion of persistence in topological data analysis and provides a non-parametric approach that is statistically sound and robust to noise in experiments. Specifically, our method reduces the peak calling problem to the study of tree-based statistics derived from the data. We validate our approach using previously published data and show that it can discover previously missed regions. CONCLUSIONS: The difficulty in accurately calling peaks for ChIP-Seq data is partly due to the difficulty in defining peaks, and we demonstrate a novel method that improves on the accuracy of previous methods in resolving peaks. Our introduction of a robust statistical test based on ideas from topological data analysis is also novel. Our methods are implemented in a program called T-PIC (Tree shape Peak Identification for ChIP-Seq) is available at http://bio.math.berkeley.edu/tpic/. BioMed Central 2011-01-12 /pmc/articles/PMC3032669/ /pubmed/21226895 http://dx.doi.org/10.1186/1471-2105-12-15 Text en Copyright ©2011 Hower et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Hower, Valerie Evans, Steven N Pachter, Lior Shape-based peak identification for ChIP-Seq
title	Shape-based peak identification for ChIP-Seq
title_full	Shape-based peak identification for ChIP-Seq
title_fullStr	Shape-based peak identification for ChIP-Seq
title_full_unstemmed	Shape-based peak identification for ChIP-Seq
title_short	Shape-based peak identification for ChIP-Seq
title_sort	shape-based peak identification for chip-seq
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3032669/ https://www.ncbi.nlm.nih.gov/pubmed/21226895 http://dx.doi.org/10.1186/1471-2105-12-15
work_keys_str_mv	AT howervalerie shapebasedpeakidentificationforchipseq AT evansstevenn shapebasedpeakidentificationforchipseq AT pachterlior shapebasedpeakidentificationforchipseq

Shape-based peak identification for ChIP-Seq

Ejemplares similares