Cargando…

Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data

Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The sta...

Descripción completa

Detalles Bibliográficos
Autores principales: Chung, Dongjun, Kuan, Pei Fen, Li, Bo, Sanalkumar, Rajendran, Liang, Kun, Bresnick, Emery H., Dewey, Colin, Keleş, Sündüz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3136429/
https://www.ncbi.nlm.nih.gov/pubmed/21779159
http://dx.doi.org/10.1371/journal.pcbi.1002111
_version_ 1782208201836462080
author Chung, Dongjun
Kuan, Pei Fen
Li, Bo
Sanalkumar, Rajendran
Liang, Kun
Bresnick, Emery H.
Dewey, Colin
Keleş, Sündüz
author_facet Chung, Dongjun
Kuan, Pei Fen
Li, Bo
Sanalkumar, Rajendran
Liang, Kun
Bresnick, Emery H.
Dewey, Colin
Keleş, Sündüz
author_sort Chung, Dongjun
collection PubMed
description Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.
format Online
Article
Text
id pubmed-3136429
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-31364292011-07-21 Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data Chung, Dongjun Kuan, Pei Fen Li, Bo Sanalkumar, Rajendran Liang, Kun Bresnick, Emery H. Dewey, Colin Keleş, Sündüz PLoS Comput Biol Research Article Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments. Public Library of Science 2011-07-14 /pmc/articles/PMC3136429/ /pubmed/21779159 http://dx.doi.org/10.1371/journal.pcbi.1002111 Text en Chung et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Chung, Dongjun
Kuan, Pei Fen
Li, Bo
Sanalkumar, Rajendran
Liang, Kun
Bresnick, Emery H.
Dewey, Colin
Keleş, Sündüz
Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data
title Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data
title_full Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data
title_fullStr Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data
title_full_unstemmed Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data
title_short Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data
title_sort discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of chip-seq data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3136429/
https://www.ncbi.nlm.nih.gov/pubmed/21779159
http://dx.doi.org/10.1371/journal.pcbi.1002111
work_keys_str_mv AT chungdongjun discoveringtranscriptionfactorbindingsitesinhighlyrepetitiveregionsofgenomeswithmultireadanalysisofchipseqdata
AT kuanpeifen discoveringtranscriptionfactorbindingsitesinhighlyrepetitiveregionsofgenomeswithmultireadanalysisofchipseqdata
AT libo discoveringtranscriptionfactorbindingsitesinhighlyrepetitiveregionsofgenomeswithmultireadanalysisofchipseqdata
AT sanalkumarrajendran discoveringtranscriptionfactorbindingsitesinhighlyrepetitiveregionsofgenomeswithmultireadanalysisofchipseqdata
AT liangkun discoveringtranscriptionfactorbindingsitesinhighlyrepetitiveregionsofgenomeswithmultireadanalysisofchipseqdata
AT bresnickemeryh discoveringtranscriptionfactorbindingsitesinhighlyrepetitiveregionsofgenomeswithmultireadanalysisofchipseqdata
AT deweycolin discoveringtranscriptionfactorbindingsitesinhighlyrepetitiveregionsofgenomeswithmultireadanalysisofchipseqdata
AT kelessunduz discoveringtranscriptionfactorbindingsitesinhighlyrepetitiveregionsofgenomeswithmultireadanalysisofchipseqdata