Cargando…

pyBedGraph: a python package for fast operations on 1D genomic signal tracks

MOTIVATION: Modern genomic research is driven by next-generation sequencing experiments such as ChIP-seq and ChIA-PET that generate coverage files for transcription factor binding, as well as DHS and ATAC-seq that yield coverage files for chromatin accessibility. Such files are in a bedGraph text fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Henry B, Kim, Minji, Chuang, Jeffrey H, Ruan, Yijun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7214040/
https://www.ncbi.nlm.nih.gov/pubmed/32044918
http://dx.doi.org/10.1093/bioinformatics/btaa061
_version_ 1783531900487335936
author Zhang, Henry B
Kim, Minji
Chuang, Jeffrey H
Ruan, Yijun
author_facet Zhang, Henry B
Kim, Minji
Chuang, Jeffrey H
Ruan, Yijun
author_sort Zhang, Henry B
collection PubMed
description MOTIVATION: Modern genomic research is driven by next-generation sequencing experiments such as ChIP-seq and ChIA-PET that generate coverage files for transcription factor binding, as well as DHS and ATAC-seq that yield coverage files for chromatin accessibility. Such files are in a bedGraph text format or a bigWig binary format. Obtaining summary statistics in a given region is a fundamental task in analyzing protein binding intensity or chromatin accessibility. However, the existing Python package for operating on coverage files is not optimized for speed. RESULTS: We developed pyBedGraph, a Python package to quickly obtain summary statistics for a given interval in a bedGraph or a bigWig file. When tested on 12 ChIP-seq, ATAC-seq, RNA-seq and ChIA-PET datasets, pyBedGraph is on average 260 times faster than the existing program pyBigWig. On average, pyBedGraph can look up the exact mean signal of 1 million regions in ∼0.26 s and can compute their approximate means in <0.12 s on a conventional laptop. AVAILABILITY AND IMPLEMENTATION: pyBedGraph is publicly available at https://github.com/TheJacksonLaboratory/pyBedGraph under the MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7214040
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-72140402020-05-15 pyBedGraph: a python package for fast operations on 1D genomic signal tracks Zhang, Henry B Kim, Minji Chuang, Jeffrey H Ruan, Yijun Bioinformatics Applications Notes MOTIVATION: Modern genomic research is driven by next-generation sequencing experiments such as ChIP-seq and ChIA-PET that generate coverage files for transcription factor binding, as well as DHS and ATAC-seq that yield coverage files for chromatin accessibility. Such files are in a bedGraph text format or a bigWig binary format. Obtaining summary statistics in a given region is a fundamental task in analyzing protein binding intensity or chromatin accessibility. However, the existing Python package for operating on coverage files is not optimized for speed. RESULTS: We developed pyBedGraph, a Python package to quickly obtain summary statistics for a given interval in a bedGraph or a bigWig file. When tested on 12 ChIP-seq, ATAC-seq, RNA-seq and ChIA-PET datasets, pyBedGraph is on average 260 times faster than the existing program pyBigWig. On average, pyBedGraph can look up the exact mean signal of 1 million regions in ∼0.26 s and can compute their approximate means in <0.12 s on a conventional laptop. AVAILABILITY AND IMPLEMENTATION: pyBedGraph is publicly available at https://github.com/TheJacksonLaboratory/pyBedGraph under the MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-05-15 2020-02-11 /pmc/articles/PMC7214040/ /pubmed/32044918 http://dx.doi.org/10.1093/bioinformatics/btaa061 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Applications Notes
Zhang, Henry B
Kim, Minji
Chuang, Jeffrey H
Ruan, Yijun
pyBedGraph: a python package for fast operations on 1D genomic signal tracks
title pyBedGraph: a python package for fast operations on 1D genomic signal tracks
title_full pyBedGraph: a python package for fast operations on 1D genomic signal tracks
title_fullStr pyBedGraph: a python package for fast operations on 1D genomic signal tracks
title_full_unstemmed pyBedGraph: a python package for fast operations on 1D genomic signal tracks
title_short pyBedGraph: a python package for fast operations on 1D genomic signal tracks
title_sort pybedgraph: a python package for fast operations on 1d genomic signal tracks
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7214040/
https://www.ncbi.nlm.nih.gov/pubmed/32044918
http://dx.doi.org/10.1093/bioinformatics/btaa061
work_keys_str_mv AT zhanghenryb pybedgraphapythonpackageforfastoperationson1dgenomicsignaltracks
AT kimminji pybedgraphapythonpackageforfastoperationson1dgenomicsignaltracks
AT chuangjeffreyh pybedgraphapythonpackageforfastoperationson1dgenomicsignaltracks
AT ruanyijun pybedgraphapythonpackageforfastoperationson1dgenomicsignaltracks