Cargando…

Population size estimation for quality control of ChIP-Seq datasets

Chromatin immunoprecipitation followed by sequencing, i.e. ChIP-Seq, is a widely used experimental technology for the identification of functional protein-DNA interactions. Nowadays, such databases as ENCODE, GTRD, ChIP-Atlas and ReMap systematically collect and annotate a large number of ChIP-Seq d...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kolmykov, Semyon K., Kondrakhin, Yury V., Yevshin, Ivan S., Sharipov, Ruslan N., Ryabova, Anna S., Kolpakov, Fedor A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2019
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6715275/ https://www.ncbi.nlm.nih.gov/pubmed/31465497 http://dx.doi.org/10.1371/journal.pone.0221760

_version_	1783447212755255296
author	Kolmykov, Semyon K. Kondrakhin, Yury V. Yevshin, Ivan S. Sharipov, Ruslan N. Ryabova, Anna S. Kolpakov, Fedor A.
author_facet	Kolmykov, Semyon K. Kondrakhin, Yury V. Yevshin, Ivan S. Sharipov, Ruslan N. Ryabova, Anna S. Kolpakov, Fedor A.
author_sort	Kolmykov, Semyon K.
collection	PubMed
description	Chromatin immunoprecipitation followed by sequencing, i.e. ChIP-Seq, is a widely used experimental technology for the identification of functional protein-DNA interactions. Nowadays, such databases as ENCODE, GTRD, ChIP-Atlas and ReMap systematically collect and annotate a large number of ChIP-Seq datasets. Comprehensive control of dataset quality is currently indispensable to select the most reliable data for further analysis. In addition to existing quality control metrics, we have developed two novel metrics that allow to control false positives and false negatives in ChIP-Seq datasets. For this purpose, we have adapted well-known population size estimate for determination of unknown number of genuine transcription factor binding regions. Determination of the proposed metrics was based on overlapping distinct binding sites derived from processing one ChIP-Seq experiment by different peak callers. Moreover, the metrics also can be useful for assessing quality of datasets obtained from processing distinct ChIP-Seq experiments by a given peak caller. We also have shown that these metrics appear to be useful not only for dataset selection but also for comparison of peak callers and identification of site motifs based on ChIP-Seq datasets. The developed algorithm for determination of the false positive control metric and false negative control metric for ChIP-Seq datasets was implemented as a plugin for a BioUML platform: https://ict.biouml.org/bioumlweb/chipseq_analysis.html.
format	Online Article Text
id	pubmed-6715275
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-67152752019-09-10 Population size estimation for quality control of ChIP-Seq datasets Kolmykov, Semyon K. Kondrakhin, Yury V. Yevshin, Ivan S. Sharipov, Ruslan N. Ryabova, Anna S. Kolpakov, Fedor A. PLoS One Research Article Chromatin immunoprecipitation followed by sequencing, i.e. ChIP-Seq, is a widely used experimental technology for the identification of functional protein-DNA interactions. Nowadays, such databases as ENCODE, GTRD, ChIP-Atlas and ReMap systematically collect and annotate a large number of ChIP-Seq datasets. Comprehensive control of dataset quality is currently indispensable to select the most reliable data for further analysis. In addition to existing quality control metrics, we have developed two novel metrics that allow to control false positives and false negatives in ChIP-Seq datasets. For this purpose, we have adapted well-known population size estimate for determination of unknown number of genuine transcription factor binding regions. Determination of the proposed metrics was based on overlapping distinct binding sites derived from processing one ChIP-Seq experiment by different peak callers. Moreover, the metrics also can be useful for assessing quality of datasets obtained from processing distinct ChIP-Seq experiments by a given peak caller. We also have shown that these metrics appear to be useful not only for dataset selection but also for comparison of peak callers and identification of site motifs based on ChIP-Seq datasets. The developed algorithm for determination of the false positive control metric and false negative control metric for ChIP-Seq datasets was implemented as a plugin for a BioUML platform: https://ict.biouml.org/bioumlweb/chipseq_analysis.html. Public Library of Science 2019-08-29 /pmc/articles/PMC6715275/ /pubmed/31465497 http://dx.doi.org/10.1371/journal.pone.0221760 Text en © 2019 Kolmykov et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Kolmykov, Semyon K. Kondrakhin, Yury V. Yevshin, Ivan S. Sharipov, Ruslan N. Ryabova, Anna S. Kolpakov, Fedor A. Population size estimation for quality control of ChIP-Seq datasets
title	Population size estimation for quality control of ChIP-Seq datasets
title_full	Population size estimation for quality control of ChIP-Seq datasets
title_fullStr	Population size estimation for quality control of ChIP-Seq datasets
title_full_unstemmed	Population size estimation for quality control of ChIP-Seq datasets
title_short	Population size estimation for quality control of ChIP-Seq datasets
title_sort	population size estimation for quality control of chip-seq datasets
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6715275/ https://www.ncbi.nlm.nih.gov/pubmed/31465497 http://dx.doi.org/10.1371/journal.pone.0221760
work_keys_str_mv	AT kolmykovsemyonk populationsizeestimationforqualitycontrolofchipseqdatasets AT kondrakhinyuryv populationsizeestimationforqualitycontrolofchipseqdatasets AT yevshinivans populationsizeestimationforqualitycontrolofchipseqdatasets AT sharipovruslann populationsizeestimationforqualitycontrolofchipseqdatasets AT ryabovaannas populationsizeestimationforqualitycontrolofchipseqdatasets AT kolpakovfedora populationsizeestimationforqualitycontrolofchipseqdatasets

Population size estimation for quality control of ChIP-Seq datasets

Ejemplares similares