Cargando…

Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning

MOTIVATION: Many peak detection algorithms have been proposed for ChIP-seq data analysis, but it is not obvious which algorithm and what parameters are optimal for any given dataset. In contrast, regions with and without obvious peaks can be easily labeled by visual inspection of aligned read counts...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hocking, Toby Dylan, Goerner-Potvin, Patricia, Morin, Andreanne, Shao, Xiaojian, Pastinen, Tomi, Bourque, Guillaume
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2017
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5408812/ https://www.ncbi.nlm.nih.gov/pubmed/27797775 http://dx.doi.org/10.1093/bioinformatics/btw672

_version_	1783232370449580032
author	Hocking, Toby Dylan Goerner-Potvin, Patricia Morin, Andreanne Shao, Xiaojian Pastinen, Tomi Bourque, Guillaume
author_facet	Hocking, Toby Dylan Goerner-Potvin, Patricia Morin, Andreanne Shao, Xiaojian Pastinen, Tomi Bourque, Guillaume
author_sort	Hocking, Toby Dylan
collection	PubMed
description	MOTIVATION: Many peak detection algorithms have been proposed for ChIP-seq data analysis, but it is not obvious which algorithm and what parameters are optimal for any given dataset. In contrast, regions with and without obvious peaks can be easily labeled by visual inspection of aligned read counts in a genome browser. We propose a supervised machine learning approach for ChIP-seq data analysis, using labels that encode qualitative judgments about which genomic regions contain or do not contain peaks. The main idea is to manually label a small subset of the genome, and then learn a model that makes consistent peak predictions on the rest of the genome. RESULTS: We created 7 new histone mark datasets with 12 826 visually determined labels, and analyzed 3 existing transcription factor datasets. We observed that default peak detection parameters yield high false positive rates, which can be reduced by learning parameters using a relatively small training set of labeled data from the same experiment type. We also observed that labels from different people are highly consistent. Overall, these data indicate that our supervised labeling method is useful for quantitatively training and testing peak detection algorithms. AVAILABILITY AND IMPLEMENTATION: Labeled histone mark data http://cbio.ensmp.fr/~thocking/chip-seq-chunk-db/, R package to compute the label error of predicted peaks https://github.com/tdhock/PeakError SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-5408812
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-54088122017-05-03 Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning Hocking, Toby Dylan Goerner-Potvin, Patricia Morin, Andreanne Shao, Xiaojian Pastinen, Tomi Bourque, Guillaume Bioinformatics Original Papers MOTIVATION: Many peak detection algorithms have been proposed for ChIP-seq data analysis, but it is not obvious which algorithm and what parameters are optimal for any given dataset. In contrast, regions with and without obvious peaks can be easily labeled by visual inspection of aligned read counts in a genome browser. We propose a supervised machine learning approach for ChIP-seq data analysis, using labels that encode qualitative judgments about which genomic regions contain or do not contain peaks. The main idea is to manually label a small subset of the genome, and then learn a model that makes consistent peak predictions on the rest of the genome. RESULTS: We created 7 new histone mark datasets with 12 826 visually determined labels, and analyzed 3 existing transcription factor datasets. We observed that default peak detection parameters yield high false positive rates, which can be reduced by learning parameters using a relatively small training set of labeled data from the same experiment type. We also observed that labels from different people are highly consistent. Overall, these data indicate that our supervised labeling method is useful for quantitatively training and testing peak detection algorithms. AVAILABILITY AND IMPLEMENTATION: Labeled histone mark data http://cbio.ensmp.fr/~thocking/chip-seq-chunk-db/, R package to compute the label error of predicted peaks https://github.com/tdhock/PeakError SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-02-15 2016-11-21 /pmc/articles/PMC5408812/ /pubmed/27797775 http://dx.doi.org/10.1093/bioinformatics/btw672 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Hocking, Toby Dylan Goerner-Potvin, Patricia Morin, Andreanne Shao, Xiaojian Pastinen, Tomi Bourque, Guillaume Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning
title	Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning
title_full	Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning
title_fullStr	Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning
title_full_unstemmed	Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning
title_short	Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning
title_sort	optimizing chip-seq peak detectors using visual labels and supervised machine learning
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5408812/ https://www.ncbi.nlm.nih.gov/pubmed/27797775 http://dx.doi.org/10.1093/bioinformatics/btw672
work_keys_str_mv	AT hockingtobydylan optimizingchipseqpeakdetectorsusingvisuallabelsandsupervisedmachinelearning AT goernerpotvinpatricia optimizingchipseqpeakdetectorsusingvisuallabelsandsupervisedmachinelearning AT morinandreanne optimizingchipseqpeakdetectorsusingvisuallabelsandsupervisedmachinelearning AT shaoxiaojian optimizingchipseqpeakdetectorsusingvisuallabelsandsupervisedmachinelearning AT pastinentomi optimizingchipseqpeakdetectorsusingvisuallabelsandsupervisedmachinelearning AT bourqueguillaume optimizingchipseqpeakdetectorsusingvisuallabelsandsupervisedmachinelearning

Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning

Ejemplares similares