Cargando…
Poly-Enrich: count-based methods for gene set enrichment testing with genomic regions
Gene set enrichment (GSE) testing enhances the biological interpretation of ChIP-seq data and other large sets of genomic regions. Our group has previously introduced two GSE methods for genomic regions: ChIP-Enrich for narrow regions and Broad-Enrich for broad regions. Here, we introduce Poly-Enric...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7003681/ https://www.ncbi.nlm.nih.gov/pubmed/32051932 http://dx.doi.org/10.1093/nargab/lqaa006 |
_version_ | 1783494576450830336 |
---|---|
author | Lee, Christopher T Cavalcante, Raymond G Lee, Chee Qin, Tingting Patil, Snehal Wang, Shuze Tsai, Zing T Y Boyle, Alan P Sartor, Maureen A |
author_facet | Lee, Christopher T Cavalcante, Raymond G Lee, Chee Qin, Tingting Patil, Snehal Wang, Shuze Tsai, Zing T Y Boyle, Alan P Sartor, Maureen A |
author_sort | Lee, Christopher T |
collection | PubMed |
description | Gene set enrichment (GSE) testing enhances the biological interpretation of ChIP-seq data and other large sets of genomic regions. Our group has previously introduced two GSE methods for genomic regions: ChIP-Enrich for narrow regions and Broad-Enrich for broad regions. Here, we introduce Poly-Enrich, which has wider applicability, additional capabilities and models the number of peaks assigned to a gene using a generalized additive model with a negative binomial family to determine gene set enrichment, while adjusting for gene locus length. As opposed to ChIP-Enrich, Poly-Enrich works well even when nearly all genes have a peak, illustrated by using Poly-Enrich to characterize pathways and types of genic regions enriched with different families of repetitive elements. By comparing Poly-Enrich and ChIP-Enrich results with ENCODE ChIP-seq data, we found that the optimal test depends more on the pathway being regulated than on properties of the transcription factors. Using known transcription factor functions, we discovered clusters of related biological processes consistently better modeled with Poly-Enrich. This suggests that the regulation of certain processes may be modified by multiple binding events, better modeled by a count-based method. Our new hybrid method automatically uses the optimal method for each gene set, with correct FDR-adjustment. |
format | Online Article Text |
id | pubmed-7003681 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-70036812020-02-10 Poly-Enrich: count-based methods for gene set enrichment testing with genomic regions Lee, Christopher T Cavalcante, Raymond G Lee, Chee Qin, Tingting Patil, Snehal Wang, Shuze Tsai, Zing T Y Boyle, Alan P Sartor, Maureen A NAR Genom Bioinform Methart Gene set enrichment (GSE) testing enhances the biological interpretation of ChIP-seq data and other large sets of genomic regions. Our group has previously introduced two GSE methods for genomic regions: ChIP-Enrich for narrow regions and Broad-Enrich for broad regions. Here, we introduce Poly-Enrich, which has wider applicability, additional capabilities and models the number of peaks assigned to a gene using a generalized additive model with a negative binomial family to determine gene set enrichment, while adjusting for gene locus length. As opposed to ChIP-Enrich, Poly-Enrich works well even when nearly all genes have a peak, illustrated by using Poly-Enrich to characterize pathways and types of genic regions enriched with different families of repetitive elements. By comparing Poly-Enrich and ChIP-Enrich results with ENCODE ChIP-seq data, we found that the optimal test depends more on the pathway being regulated than on properties of the transcription factors. Using known transcription factor functions, we discovered clusters of related biological processes consistently better modeled with Poly-Enrich. This suggests that the regulation of certain processes may be modified by multiple binding events, better modeled by a count-based method. Our new hybrid method automatically uses the optimal method for each gene set, with correct FDR-adjustment. Oxford University Press 2020-02-06 /pmc/articles/PMC7003681/ /pubmed/32051932 http://dx.doi.org/10.1093/nargab/lqaa006 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methart Lee, Christopher T Cavalcante, Raymond G Lee, Chee Qin, Tingting Patil, Snehal Wang, Shuze Tsai, Zing T Y Boyle, Alan P Sartor, Maureen A Poly-Enrich: count-based methods for gene set enrichment testing with genomic regions |
title | Poly-Enrich: count-based methods for gene set enrichment testing with genomic regions |
title_full | Poly-Enrich: count-based methods for gene set enrichment testing with genomic regions |
title_fullStr | Poly-Enrich: count-based methods for gene set enrichment testing with genomic regions |
title_full_unstemmed | Poly-Enrich: count-based methods for gene set enrichment testing with genomic regions |
title_short | Poly-Enrich: count-based methods for gene set enrichment testing with genomic regions |
title_sort | poly-enrich: count-based methods for gene set enrichment testing with genomic regions |
topic | Methart |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7003681/ https://www.ncbi.nlm.nih.gov/pubmed/32051932 http://dx.doi.org/10.1093/nargab/lqaa006 |
work_keys_str_mv | AT leechristophert polyenrichcountbasedmethodsforgenesetenrichmenttestingwithgenomicregions AT cavalcanteraymondg polyenrichcountbasedmethodsforgenesetenrichmenttestingwithgenomicregions AT leechee polyenrichcountbasedmethodsforgenesetenrichmenttestingwithgenomicregions AT qintingting polyenrichcountbasedmethodsforgenesetenrichmenttestingwithgenomicregions AT patilsnehal polyenrichcountbasedmethodsforgenesetenrichmenttestingwithgenomicregions AT wangshuze polyenrichcountbasedmethodsforgenesetenrichmenttestingwithgenomicregions AT tsaizingty polyenrichcountbasedmethodsforgenesetenrichmenttestingwithgenomicregions AT boylealanp polyenrichcountbasedmethodsforgenesetenrichmenttestingwithgenomicregions AT sartormaureena polyenrichcountbasedmethodsforgenesetenrichmenttestingwithgenomicregions |