Cargando…

AIControl: replacing matched control experiments with machine learning improves ChIP-seq peak identification

ChIP-seq is a technique to determine binding locations of transcription factors, which remains a central challenge in molecular biology. Current practice is to use a ‘control’ dataset to remove background signals from a immunoprecipitation (IP) ‘target’ dataset. We introduce the AIControl framework,...

Descripción completa

Detalles Bibliográficos
Autores principales: Hiranuma, Naozumi, Lundberg, Scott M, Lee, Su-In
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6547432/
https://www.ncbi.nlm.nih.gov/pubmed/30869146
http://dx.doi.org/10.1093/nar/gkz156
_version_ 1783423674529873920
author Hiranuma, Naozumi
Lundberg, Scott M
Lee, Su-In
author_facet Hiranuma, Naozumi
Lundberg, Scott M
Lee, Su-In
author_sort Hiranuma, Naozumi
collection PubMed
description ChIP-seq is a technique to determine binding locations of transcription factors, which remains a central challenge in molecular biology. Current practice is to use a ‘control’ dataset to remove background signals from a immunoprecipitation (IP) ‘target’ dataset. We introduce the AIControl framework, which eliminates the need to obtain a control dataset and instead identifies binding peaks by estimating the distributions of background signals from many publicly available control ChIP-seq datasets. We thereby avoid the cost of running control experiments while simultaneously increasing the accuracy of binding location identification. Specifically, AIControl can (i) estimate background signals at fine resolution, (ii) systematically weigh the most appropriate control datasets in a data-driven way, (iii) capture sources of potential biases that may be missed by one control dataset and (iv) remove the need for costly and time-consuming control experiments. We applied AIControl to 410 IP datasets in the ENCODE ChIP-seq database, using 440 control datasets from 107 cell types to impute background signal. Without using matched control datasets, AIControl identified peaks that were more enriched for putative binding sites than those identified by other popular peak callers that used a matched control dataset. We also demonstrated that our framework identifies binding sites that recover documented protein interactions more accurately.
format Online
Article
Text
id pubmed-6547432
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-65474322019-06-13 AIControl: replacing matched control experiments with machine learning improves ChIP-seq peak identification Hiranuma, Naozumi Lundberg, Scott M Lee, Su-In Nucleic Acids Res Methods Online ChIP-seq is a technique to determine binding locations of transcription factors, which remains a central challenge in molecular biology. Current practice is to use a ‘control’ dataset to remove background signals from a immunoprecipitation (IP) ‘target’ dataset. We introduce the AIControl framework, which eliminates the need to obtain a control dataset and instead identifies binding peaks by estimating the distributions of background signals from many publicly available control ChIP-seq datasets. We thereby avoid the cost of running control experiments while simultaneously increasing the accuracy of binding location identification. Specifically, AIControl can (i) estimate background signals at fine resolution, (ii) systematically weigh the most appropriate control datasets in a data-driven way, (iii) capture sources of potential biases that may be missed by one control dataset and (iv) remove the need for costly and time-consuming control experiments. We applied AIControl to 410 IP datasets in the ENCODE ChIP-seq database, using 440 control datasets from 107 cell types to impute background signal. Without using matched control datasets, AIControl identified peaks that were more enriched for putative binding sites than those identified by other popular peak callers that used a matched control dataset. We also demonstrated that our framework identifies binding sites that recover documented protein interactions more accurately. Oxford University Press 2019-06-04 2019-03-14 /pmc/articles/PMC6547432/ /pubmed/30869146 http://dx.doi.org/10.1093/nar/gkz156 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Hiranuma, Naozumi
Lundberg, Scott M
Lee, Su-In
AIControl: replacing matched control experiments with machine learning improves ChIP-seq peak identification
title AIControl: replacing matched control experiments with machine learning improves ChIP-seq peak identification
title_full AIControl: replacing matched control experiments with machine learning improves ChIP-seq peak identification
title_fullStr AIControl: replacing matched control experiments with machine learning improves ChIP-seq peak identification
title_full_unstemmed AIControl: replacing matched control experiments with machine learning improves ChIP-seq peak identification
title_short AIControl: replacing matched control experiments with machine learning improves ChIP-seq peak identification
title_sort aicontrol: replacing matched control experiments with machine learning improves chip-seq peak identification
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6547432/
https://www.ncbi.nlm.nih.gov/pubmed/30869146
http://dx.doi.org/10.1093/nar/gkz156
work_keys_str_mv AT hiranumanaozumi aicontrolreplacingmatchedcontrolexperimentswithmachinelearningimproveschipseqpeakidentification
AT lundbergscottm aicontrolreplacingmatchedcontrolexperimentswithmachinelearningimproveschipseqpeakidentification
AT leesuin aicontrolreplacingmatchedcontrolexperimentswithmachinelearningimproveschipseqpeakidentification