Cargando…
pyBedGraph: a python package for fast operations on 1D genomic signal tracks
MOTIVATION: Modern genomic research is driven by next-generation sequencing experiments such as ChIP-seq and ChIA-PET that generate coverage files for transcription factor binding, as well as DHS and ATAC-seq that yield coverage files for chromatin accessibility. Such files are in a bedGraph text fo...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7214040/ https://www.ncbi.nlm.nih.gov/pubmed/32044918 http://dx.doi.org/10.1093/bioinformatics/btaa061 |
_version_ | 1783531900487335936 |
---|---|
author | Zhang, Henry B Kim, Minji Chuang, Jeffrey H Ruan, Yijun |
author_facet | Zhang, Henry B Kim, Minji Chuang, Jeffrey H Ruan, Yijun |
author_sort | Zhang, Henry B |
collection | PubMed |
description | MOTIVATION: Modern genomic research is driven by next-generation sequencing experiments such as ChIP-seq and ChIA-PET that generate coverage files for transcription factor binding, as well as DHS and ATAC-seq that yield coverage files for chromatin accessibility. Such files are in a bedGraph text format or a bigWig binary format. Obtaining summary statistics in a given region is a fundamental task in analyzing protein binding intensity or chromatin accessibility. However, the existing Python package for operating on coverage files is not optimized for speed. RESULTS: We developed pyBedGraph, a Python package to quickly obtain summary statistics for a given interval in a bedGraph or a bigWig file. When tested on 12 ChIP-seq, ATAC-seq, RNA-seq and ChIA-PET datasets, pyBedGraph is on average 260 times faster than the existing program pyBigWig. On average, pyBedGraph can look up the exact mean signal of 1 million regions in ∼0.26 s and can compute their approximate means in <0.12 s on a conventional laptop. AVAILABILITY AND IMPLEMENTATION: pyBedGraph is publicly available at https://github.com/TheJacksonLaboratory/pyBedGraph under the MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-7214040 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-72140402020-05-15 pyBedGraph: a python package for fast operations on 1D genomic signal tracks Zhang, Henry B Kim, Minji Chuang, Jeffrey H Ruan, Yijun Bioinformatics Applications Notes MOTIVATION: Modern genomic research is driven by next-generation sequencing experiments such as ChIP-seq and ChIA-PET that generate coverage files for transcription factor binding, as well as DHS and ATAC-seq that yield coverage files for chromatin accessibility. Such files are in a bedGraph text format or a bigWig binary format. Obtaining summary statistics in a given region is a fundamental task in analyzing protein binding intensity or chromatin accessibility. However, the existing Python package for operating on coverage files is not optimized for speed. RESULTS: We developed pyBedGraph, a Python package to quickly obtain summary statistics for a given interval in a bedGraph or a bigWig file. When tested on 12 ChIP-seq, ATAC-seq, RNA-seq and ChIA-PET datasets, pyBedGraph is on average 260 times faster than the existing program pyBigWig. On average, pyBedGraph can look up the exact mean signal of 1 million regions in ∼0.26 s and can compute their approximate means in <0.12 s on a conventional laptop. AVAILABILITY AND IMPLEMENTATION: pyBedGraph is publicly available at https://github.com/TheJacksonLaboratory/pyBedGraph under the MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-05-15 2020-02-11 /pmc/articles/PMC7214040/ /pubmed/32044918 http://dx.doi.org/10.1093/bioinformatics/btaa061 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Applications Notes Zhang, Henry B Kim, Minji Chuang, Jeffrey H Ruan, Yijun pyBedGraph: a python package for fast operations on 1D genomic signal tracks |
title | pyBedGraph: a python package for fast operations on 1D genomic signal tracks |
title_full | pyBedGraph: a python package for fast operations on 1D genomic signal tracks |
title_fullStr | pyBedGraph: a python package for fast operations on 1D genomic signal tracks |
title_full_unstemmed | pyBedGraph: a python package for fast operations on 1D genomic signal tracks |
title_short | pyBedGraph: a python package for fast operations on 1D genomic signal tracks |
title_sort | pybedgraph: a python package for fast operations on 1d genomic signal tracks |
topic | Applications Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7214040/ https://www.ncbi.nlm.nih.gov/pubmed/32044918 http://dx.doi.org/10.1093/bioinformatics/btaa061 |
work_keys_str_mv | AT zhanghenryb pybedgraphapythonpackageforfastoperationson1dgenomicsignaltracks AT kimminji pybedgraphapythonpackageforfastoperationson1dgenomicsignaltracks AT chuangjeffreyh pybedgraphapythonpackageforfastoperationson1dgenomicsignaltracks AT ruanyijun pybedgraphapythonpackageforfastoperationson1dgenomicsignaltracks |