Cargando…

A Non-Parametric Peak Calling Algorithm for DamID-Seq

Protein—DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS) of double sex (DSX)—an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID) technology to t...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Renhua, Hempel, Leonie U., Jiang, Tingbo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4364623/
https://www.ncbi.nlm.nih.gov/pubmed/25785608
http://dx.doi.org/10.1371/journal.pone.0117415
_version_ 1782362094273822720
author Li, Renhua
Hempel, Leonie U.
Jiang, Tingbo
author_facet Li, Renhua
Hempel, Leonie U.
Jiang, Tingbo
author_sort Li, Renhua
collection PubMed
description Protein—DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS) of double sex (DSX)—an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID) technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq). One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only). After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1) reads resampling; 2) reads scaling (normalization) and computing signal-to-noise fold changes; 3) filtering; 4) Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC). We also used irreproducible discovery rate (IDR) analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.
format Online
Article
Text
id pubmed-4364623
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-43646232015-03-23 A Non-Parametric Peak Calling Algorithm for DamID-Seq Li, Renhua Hempel, Leonie U. Jiang, Tingbo PLoS One Research Article Protein—DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS) of double sex (DSX)—an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID) technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq). One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only). After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1) reads resampling; 2) reads scaling (normalization) and computing signal-to-noise fold changes; 3) filtering; 4) Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC). We also used irreproducible discovery rate (IDR) analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width. Public Library of Science 2015-03-18 /pmc/articles/PMC4364623/ /pubmed/25785608 http://dx.doi.org/10.1371/journal.pone.0117415 Text en © 2015 Li et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Li, Renhua
Hempel, Leonie U.
Jiang, Tingbo
A Non-Parametric Peak Calling Algorithm for DamID-Seq
title A Non-Parametric Peak Calling Algorithm for DamID-Seq
title_full A Non-Parametric Peak Calling Algorithm for DamID-Seq
title_fullStr A Non-Parametric Peak Calling Algorithm for DamID-Seq
title_full_unstemmed A Non-Parametric Peak Calling Algorithm for DamID-Seq
title_short A Non-Parametric Peak Calling Algorithm for DamID-Seq
title_sort non-parametric peak calling algorithm for damid-seq
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4364623/
https://www.ncbi.nlm.nih.gov/pubmed/25785608
http://dx.doi.org/10.1371/journal.pone.0117415
work_keys_str_mv AT lirenhua anonparametricpeakcallingalgorithmfordamidseq
AT hempelleonieu anonparametricpeakcallingalgorithmfordamidseq
AT jiangtingbo anonparametricpeakcallingalgorithmfordamidseq
AT lirenhua nonparametricpeakcallingalgorithmfordamidseq
AT hempelleonieu nonparametricpeakcallingalgorithmfordamidseq
AT jiangtingbo nonparametricpeakcallingalgorithmfordamidseq