Cargando…

Cobind: quantitative analysis of the genomic overlaps

MOTIVATION: Analyzing the overlap between two sets of genomic intervals is a frequent task in the field of bioinformatics. Typically, this is accomplished by counting the number (or proportion) of overlapped regions, which applies an arbitrary threshold to determine if two genomic intervals are over...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Tao, Guo, Lingyun, Yan, Huihuang, Wang, Liguo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10438957/
https://www.ncbi.nlm.nih.gov/pubmed/37600846
http://dx.doi.org/10.1093/bioadv/vbad104
_version_ 1785092833489715200
author Ma, Tao
Guo, Lingyun
Yan, Huihuang
Wang, Liguo
author_facet Ma, Tao
Guo, Lingyun
Yan, Huihuang
Wang, Liguo
author_sort Ma, Tao
collection PubMed
description MOTIVATION: Analyzing the overlap between two sets of genomic intervals is a frequent task in the field of bioinformatics. Typically, this is accomplished by counting the number (or proportion) of overlapped regions, which applies an arbitrary threshold to determine if two genomic intervals are overlapped. By making binary calls but disregarding the magnitude of the overlap, such an approach often leads to biased, non-reproducible, and incomparable results. RESULTS: We developed the cobind package, which incorporates six statistical measures: the Jaccard coefficient, Sørensen–Dice coefficient, Szymkiewicz–Simpson coefficient, collocation coefficient, pointwise mutual information (PMI), and normalized PMI. These measures allow for a quantitative assessment of the collocation strength between two sets of genomic intervals. To demonstrate the effectiveness of these methods, we applied them to analyze CTCF’s binding sites identified from ChIP-seq, cancer-specific open-chromatin regions (OCRs) identified from ATAC-seq of 17 cancer types, and oligodendrocytes-specific OCRs identified from scATAC-seq. Our results indicated that these new approaches effectively re-discover CTCF’s cofactors, as well as cancer-specific and oligodendrocytes-specific master regulators implicated in disease and cell type development. AVAILABILITY AND IMPLEMENTATION: The cobind package is implemented in Python and freely available at https://cobind.readthedocs.io/en/latest/.
format Online
Article
Text
id pubmed-10438957
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-104389572023-08-19 Cobind: quantitative analysis of the genomic overlaps Ma, Tao Guo, Lingyun Yan, Huihuang Wang, Liguo Bioinform Adv Original Article MOTIVATION: Analyzing the overlap between two sets of genomic intervals is a frequent task in the field of bioinformatics. Typically, this is accomplished by counting the number (or proportion) of overlapped regions, which applies an arbitrary threshold to determine if two genomic intervals are overlapped. By making binary calls but disregarding the magnitude of the overlap, such an approach often leads to biased, non-reproducible, and incomparable results. RESULTS: We developed the cobind package, which incorporates six statistical measures: the Jaccard coefficient, Sørensen–Dice coefficient, Szymkiewicz–Simpson coefficient, collocation coefficient, pointwise mutual information (PMI), and normalized PMI. These measures allow for a quantitative assessment of the collocation strength between two sets of genomic intervals. To demonstrate the effectiveness of these methods, we applied them to analyze CTCF’s binding sites identified from ChIP-seq, cancer-specific open-chromatin regions (OCRs) identified from ATAC-seq of 17 cancer types, and oligodendrocytes-specific OCRs identified from scATAC-seq. Our results indicated that these new approaches effectively re-discover CTCF’s cofactors, as well as cancer-specific and oligodendrocytes-specific master regulators implicated in disease and cell type development. AVAILABILITY AND IMPLEMENTATION: The cobind package is implemented in Python and freely available at https://cobind.readthedocs.io/en/latest/. Oxford University Press 2023-08-07 /pmc/articles/PMC10438957/ /pubmed/37600846 http://dx.doi.org/10.1093/bioadv/vbad104 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Ma, Tao
Guo, Lingyun
Yan, Huihuang
Wang, Liguo
Cobind: quantitative analysis of the genomic overlaps
title Cobind: quantitative analysis of the genomic overlaps
title_full Cobind: quantitative analysis of the genomic overlaps
title_fullStr Cobind: quantitative analysis of the genomic overlaps
title_full_unstemmed Cobind: quantitative analysis of the genomic overlaps
title_short Cobind: quantitative analysis of the genomic overlaps
title_sort cobind: quantitative analysis of the genomic overlaps
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10438957/
https://www.ncbi.nlm.nih.gov/pubmed/37600846
http://dx.doi.org/10.1093/bioadv/vbad104
work_keys_str_mv AT matao cobindquantitativeanalysisofthegenomicoverlaps
AT guolingyun cobindquantitativeanalysisofthegenomicoverlaps
AT yanhuihuang cobindquantitativeanalysisofthegenomicoverlaps
AT wangliguo cobindquantitativeanalysisofthegenomicoverlaps