Cargando…

Poly-Enrich: count-based methods for gene set enrichment testing with genomic regions

Gene set enrichment (GSE) testing enhances the biological interpretation of ChIP-seq data and other large sets of genomic regions. Our group has previously introduced two GSE methods for genomic regions: ChIP-Enrich for narrow regions and Broad-Enrich for broad regions. Here, we introduce Poly-Enric...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Christopher T, Cavalcante, Raymond G, Lee, Chee, Qin, Tingting, Patil, Snehal, Wang, Shuze, Tsai, Zing T Y, Boyle, Alan P, Sartor, Maureen A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7003681/
https://www.ncbi.nlm.nih.gov/pubmed/32051932
http://dx.doi.org/10.1093/nargab/lqaa006
_version_ 1783494576450830336
author Lee, Christopher T
Cavalcante, Raymond G
Lee, Chee
Qin, Tingting
Patil, Snehal
Wang, Shuze
Tsai, Zing T Y
Boyle, Alan P
Sartor, Maureen A
author_facet Lee, Christopher T
Cavalcante, Raymond G
Lee, Chee
Qin, Tingting
Patil, Snehal
Wang, Shuze
Tsai, Zing T Y
Boyle, Alan P
Sartor, Maureen A
author_sort Lee, Christopher T
collection PubMed
description Gene set enrichment (GSE) testing enhances the biological interpretation of ChIP-seq data and other large sets of genomic regions. Our group has previously introduced two GSE methods for genomic regions: ChIP-Enrich for narrow regions and Broad-Enrich for broad regions. Here, we introduce Poly-Enrich, which has wider applicability, additional capabilities and models the number of peaks assigned to a gene using a generalized additive model with a negative binomial family to determine gene set enrichment, while adjusting for gene locus length. As opposed to ChIP-Enrich, Poly-Enrich works well even when nearly all genes have a peak, illustrated by using Poly-Enrich to characterize pathways and types of genic regions enriched with different families of repetitive elements. By comparing Poly-Enrich and ChIP-Enrich results with ENCODE ChIP-seq data, we found that the optimal test depends more on the pathway being regulated than on properties of the transcription factors. Using known transcription factor functions, we discovered clusters of related biological processes consistently better modeled with Poly-Enrich. This suggests that the regulation of certain processes may be modified by multiple binding events, better modeled by a count-based method. Our new hybrid method automatically uses the optimal method for each gene set, with correct FDR-adjustment.
format Online
Article
Text
id pubmed-7003681
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-70036812020-02-10 Poly-Enrich: count-based methods for gene set enrichment testing with genomic regions Lee, Christopher T Cavalcante, Raymond G Lee, Chee Qin, Tingting Patil, Snehal Wang, Shuze Tsai, Zing T Y Boyle, Alan P Sartor, Maureen A NAR Genom Bioinform Methart Gene set enrichment (GSE) testing enhances the biological interpretation of ChIP-seq data and other large sets of genomic regions. Our group has previously introduced two GSE methods for genomic regions: ChIP-Enrich for narrow regions and Broad-Enrich for broad regions. Here, we introduce Poly-Enrich, which has wider applicability, additional capabilities and models the number of peaks assigned to a gene using a generalized additive model with a negative binomial family to determine gene set enrichment, while adjusting for gene locus length. As opposed to ChIP-Enrich, Poly-Enrich works well even when nearly all genes have a peak, illustrated by using Poly-Enrich to characterize pathways and types of genic regions enriched with different families of repetitive elements. By comparing Poly-Enrich and ChIP-Enrich results with ENCODE ChIP-seq data, we found that the optimal test depends more on the pathway being regulated than on properties of the transcription factors. Using known transcription factor functions, we discovered clusters of related biological processes consistently better modeled with Poly-Enrich. This suggests that the regulation of certain processes may be modified by multiple binding events, better modeled by a count-based method. Our new hybrid method automatically uses the optimal method for each gene set, with correct FDR-adjustment. Oxford University Press 2020-02-06 /pmc/articles/PMC7003681/ /pubmed/32051932 http://dx.doi.org/10.1093/nargab/lqaa006 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methart
Lee, Christopher T
Cavalcante, Raymond G
Lee, Chee
Qin, Tingting
Patil, Snehal
Wang, Shuze
Tsai, Zing T Y
Boyle, Alan P
Sartor, Maureen A
Poly-Enrich: count-based methods for gene set enrichment testing with genomic regions
title Poly-Enrich: count-based methods for gene set enrichment testing with genomic regions
title_full Poly-Enrich: count-based methods for gene set enrichment testing with genomic regions
title_fullStr Poly-Enrich: count-based methods for gene set enrichment testing with genomic regions
title_full_unstemmed Poly-Enrich: count-based methods for gene set enrichment testing with genomic regions
title_short Poly-Enrich: count-based methods for gene set enrichment testing with genomic regions
title_sort poly-enrich: count-based methods for gene set enrichment testing with genomic regions
topic Methart
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7003681/
https://www.ncbi.nlm.nih.gov/pubmed/32051932
http://dx.doi.org/10.1093/nargab/lqaa006
work_keys_str_mv AT leechristophert polyenrichcountbasedmethodsforgenesetenrichmenttestingwithgenomicregions
AT cavalcanteraymondg polyenrichcountbasedmethodsforgenesetenrichmenttestingwithgenomicregions
AT leechee polyenrichcountbasedmethodsforgenesetenrichmenttestingwithgenomicregions
AT qintingting polyenrichcountbasedmethodsforgenesetenrichmenttestingwithgenomicregions
AT patilsnehal polyenrichcountbasedmethodsforgenesetenrichmenttestingwithgenomicregions
AT wangshuze polyenrichcountbasedmethodsforgenesetenrichmenttestingwithgenomicregions
AT tsaizingty polyenrichcountbasedmethodsforgenesetenrichmenttestingwithgenomicregions
AT boylealanp polyenrichcountbasedmethodsforgenesetenrichmenttestingwithgenomicregions
AT sartormaureena polyenrichcountbasedmethodsforgenesetenrichmenttestingwithgenomicregions