Cargando…
EPIGENE: genome-wide transcription unit annotation using a multivariate probabilistic model of histone modifications
BACKGROUND: Understanding the transcriptome is critical for explaining the functional as well as regulatory roles of genomic regions. Current methods for the identification of transcription units (TUs) use RNA-seq that, however, require large quantities of mRNA rendering the identification of inhere...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7137282/ https://www.ncbi.nlm.nih.gov/pubmed/32264931 http://dx.doi.org/10.1186/s13072-020-00341-z |
_version_ | 1783518395295072256 |
---|---|
author | Sahu, Anshupa Li, Na Dunkel, Ilona Chung, Ho-Ryun |
author_facet | Sahu, Anshupa Li, Na Dunkel, Ilona Chung, Ho-Ryun |
author_sort | Sahu, Anshupa |
collection | PubMed |
description | BACKGROUND: Understanding the transcriptome is critical for explaining the functional as well as regulatory roles of genomic regions. Current methods for the identification of transcription units (TUs) use RNA-seq that, however, require large quantities of mRNA rendering the identification of inherently unstable TUs, e.g. miRNA precursors, difficult. This problem can be alleviated by chromatin-based approaches due to a correlation between histone modifications and transcription. RESULTS: Here, we introduce EPIGENE, a novel chromatin segmentation method for the identification of active TUs using transcription-associated histone modifications. Unlike the existing chromatin segmentation approaches, EPIGENE uses a constrained, semi-supervised multivariate hidden Markov model (HMM) that models the observed combination of histone modifications using a product of independent Bernoulli random variables, to identify active TUs. Our results show that EPIGENE can identify genome-wide TUs in an unbiased manner. EPIGENE-predicted TUs show an enrichment of RNA Polymerase II at the transcription start site and in gene body indicating that they are indeed transcribed. Comprehensive validation using existing annotations revealed that 93% of EPIGENE TUs can be explained by existing gene annotations and 5% of EPIGENE TUs in HepG2 can be explained by microRNA annotations. EPIGENE outperformed the existing RNA-seq-based approaches in TU prediction precision across human cell lines. Finally, we identified 232 novel TUs in K562 and 43 novel cell-specific TUs all of which were supported by RNA Polymerase II ChIP-seq and Nascent RNA-seq data. CONCLUSION: We demonstrate the applicability of EPIGENE to identify genome-wide active TUs and to provide valuable information about unannotated TUs. EPIGENE is an open-source method and is freely available at: https://github.com/imbbLab/EPIGENE. |
format | Online Article Text |
id | pubmed-7137282 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-71372822020-04-11 EPIGENE: genome-wide transcription unit annotation using a multivariate probabilistic model of histone modifications Sahu, Anshupa Li, Na Dunkel, Ilona Chung, Ho-Ryun Epigenetics Chromatin Methodology BACKGROUND: Understanding the transcriptome is critical for explaining the functional as well as regulatory roles of genomic regions. Current methods for the identification of transcription units (TUs) use RNA-seq that, however, require large quantities of mRNA rendering the identification of inherently unstable TUs, e.g. miRNA precursors, difficult. This problem can be alleviated by chromatin-based approaches due to a correlation between histone modifications and transcription. RESULTS: Here, we introduce EPIGENE, a novel chromatin segmentation method for the identification of active TUs using transcription-associated histone modifications. Unlike the existing chromatin segmentation approaches, EPIGENE uses a constrained, semi-supervised multivariate hidden Markov model (HMM) that models the observed combination of histone modifications using a product of independent Bernoulli random variables, to identify active TUs. Our results show that EPIGENE can identify genome-wide TUs in an unbiased manner. EPIGENE-predicted TUs show an enrichment of RNA Polymerase II at the transcription start site and in gene body indicating that they are indeed transcribed. Comprehensive validation using existing annotations revealed that 93% of EPIGENE TUs can be explained by existing gene annotations and 5% of EPIGENE TUs in HepG2 can be explained by microRNA annotations. EPIGENE outperformed the existing RNA-seq-based approaches in TU prediction precision across human cell lines. Finally, we identified 232 novel TUs in K562 and 43 novel cell-specific TUs all of which were supported by RNA Polymerase II ChIP-seq and Nascent RNA-seq data. CONCLUSION: We demonstrate the applicability of EPIGENE to identify genome-wide active TUs and to provide valuable information about unannotated TUs. EPIGENE is an open-source method and is freely available at: https://github.com/imbbLab/EPIGENE. BioMed Central 2020-04-07 /pmc/articles/PMC7137282/ /pubmed/32264931 http://dx.doi.org/10.1186/s13072-020-00341-z Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Sahu, Anshupa Li, Na Dunkel, Ilona Chung, Ho-Ryun EPIGENE: genome-wide transcription unit annotation using a multivariate probabilistic model of histone modifications |
title | EPIGENE: genome-wide transcription unit annotation using a multivariate probabilistic model of histone modifications |
title_full | EPIGENE: genome-wide transcription unit annotation using a multivariate probabilistic model of histone modifications |
title_fullStr | EPIGENE: genome-wide transcription unit annotation using a multivariate probabilistic model of histone modifications |
title_full_unstemmed | EPIGENE: genome-wide transcription unit annotation using a multivariate probabilistic model of histone modifications |
title_short | EPIGENE: genome-wide transcription unit annotation using a multivariate probabilistic model of histone modifications |
title_sort | epigene: genome-wide transcription unit annotation using a multivariate probabilistic model of histone modifications |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7137282/ https://www.ncbi.nlm.nih.gov/pubmed/32264931 http://dx.doi.org/10.1186/s13072-020-00341-z |
work_keys_str_mv | AT sahuanshupa epigenegenomewidetranscriptionunitannotationusingamultivariateprobabilisticmodelofhistonemodifications AT lina epigenegenomewidetranscriptionunitannotationusingamultivariateprobabilisticmodelofhistonemodifications AT dunkelilona epigenegenomewidetranscriptionunitannotationusingamultivariateprobabilisticmodelofhistonemodifications AT chunghoryun epigenegenomewidetranscriptionunitannotationusingamultivariateprobabilisticmodelofhistonemodifications |