Cargando…

False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions

Motivation: Sequencing-based assays such as ChIP-seq, DNase-seq and MNase-seq have become important tools for genome annotation. In these assays, short sequence reads enriched for loci of interest are mapped to a reference genome to determine their origin. Here, we consider whether false positive pe...

Descripción completa

Detalles Bibliográficos
Autores principales: Pickrell, Joseph K., Gaffney, Daniel J., Gilad, Yoav, Pritchard, Jonathan K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3137225/
https://www.ncbi.nlm.nih.gov/pubmed/21690102
http://dx.doi.org/10.1093/bioinformatics/btr354
_version_ 1782208276028456960
author Pickrell, Joseph K.
Gaffney, Daniel J.
Gilad, Yoav
Pritchard, Jonathan K.
author_facet Pickrell, Joseph K.
Gaffney, Daniel J.
Gilad, Yoav
Pritchard, Jonathan K.
author_sort Pickrell, Joseph K.
collection PubMed
description Motivation: Sequencing-based assays such as ChIP-seq, DNase-seq and MNase-seq have become important tools for genome annotation. In these assays, short sequence reads enriched for loci of interest are mapped to a reference genome to determine their origin. Here, we consider whether false positive peak calls can be caused by particular type of error in the reference genome: multicopy sequences which have been incorrectly assembled and collapsed into a single copy. Results: Using sequencing data from the 1000 Genomes Project, we systematically scanned the human genome for regions of high sequencing depth. These regions are highly enriched for erroneously inferred transcription factor binding sites, positions of nucleosomes and regions of open chromatin. We suggest a simple masking procedure to remove these regions and reduce false positive calls. Availability: Files for masking out these regions are available at eqtl.uchicago.edu Contact: pickrell@uchicago.edu; dgaffney@uchicago.edu; gilad@uchicago.edu; pritch@uchicago.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-3137225
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-31372252011-07-15 False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions Pickrell, Joseph K. Gaffney, Daniel J. Gilad, Yoav Pritchard, Jonathan K. Bioinformatics Applications Note Motivation: Sequencing-based assays such as ChIP-seq, DNase-seq and MNase-seq have become important tools for genome annotation. In these assays, short sequence reads enriched for loci of interest are mapped to a reference genome to determine their origin. Here, we consider whether false positive peak calls can be caused by particular type of error in the reference genome: multicopy sequences which have been incorrectly assembled and collapsed into a single copy. Results: Using sequencing data from the 1000 Genomes Project, we systematically scanned the human genome for regions of high sequencing depth. These regions are highly enriched for erroneously inferred transcription factor binding sites, positions of nucleosomes and regions of open chromatin. We suggest a simple masking procedure to remove these regions and reduce false positive calls. Availability: Files for masking out these regions are available at eqtl.uchicago.edu Contact: pickrell@uchicago.edu; dgaffney@uchicago.edu; gilad@uchicago.edu; pritch@uchicago.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2011-08-01 2011-06-19 /pmc/articles/PMC3137225/ /pubmed/21690102 http://dx.doi.org/10.1093/bioinformatics/btr354 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Note
Pickrell, Joseph K.
Gaffney, Daniel J.
Gilad, Yoav
Pritchard, Jonathan K.
False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions
title False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions
title_full False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions
title_fullStr False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions
title_full_unstemmed False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions
title_short False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions
title_sort false positive peaks in chip-seq and other sequencing-based functional assays caused by unannotated high copy number regions
topic Applications Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3137225/
https://www.ncbi.nlm.nih.gov/pubmed/21690102
http://dx.doi.org/10.1093/bioinformatics/btr354
work_keys_str_mv AT pickrelljosephk falsepositivepeaksinchipseqandothersequencingbasedfunctionalassayscausedbyunannotatedhighcopynumberregions
AT gaffneydanielj falsepositivepeaksinchipseqandothersequencingbasedfunctionalassayscausedbyunannotatedhighcopynumberregions
AT giladyoav falsepositivepeaksinchipseqandothersequencingbasedfunctionalassayscausedbyunannotatedhighcopynumberregions
AT pritchardjonathank falsepositivepeaksinchipseqandothersequencingbasedfunctionalassayscausedbyunannotatedhighcopynumberregions