Cargando…

groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data

BACKGROUND: Global run-on coupled with deep sequencing (GRO-seq) provides extensive information on the location and function of coding and non-coding transcripts, including primary microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and enhancer RNAs (eRNAs), as well as yet undiscovered classes of t...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chae, Minho, Danko, Charles G., Kraus, W. Lee
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4502638/ https://www.ncbi.nlm.nih.gov/pubmed/26173492 http://dx.doi.org/10.1186/s12859-015-0656-3

_version_	1782381243320500224
author	Chae, Minho Danko, Charles G. Kraus, W. Lee
author_facet	Chae, Minho Danko, Charles G. Kraus, W. Lee
author_sort	Chae, Minho
collection	PubMed
description	BACKGROUND: Global run-on coupled with deep sequencing (GRO-seq) provides extensive information on the location and function of coding and non-coding transcripts, including primary microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and enhancer RNAs (eRNAs), as well as yet undiscovered classes of transcripts. However, few computational tools tailored toward this new type of sequencing data are available, limiting the applicability of GRO-seq data for identifying novel transcription units. RESULTS: Here, we present groHMM, a computational tool in R, which defines the boundaries of transcription units de novo using a two state hidden-Markov model (HMM). A systematic comparison of the performance between groHMM and two existing peak-calling methods tuned to identify broad regions (SICER and HOMER) favorably supports our approach on existing GRO-seq data from MCF-7 breast cancer cells. To demonstrate the broader utility of our approach, we have used groHMM to annotate a diverse array of transcription units (i.e., primary transcripts) from four GRO-seq data sets derived from cells representing a variety of different human tissue types, including non-transformed cells (cardiomyocytes and lung fibroblasts) and transformed cells (LNCaP and MCF-7 cancer cells), as well as non-mammalian cells (from flies and worms). As an example of the utility of groHMM and its application to questions about the transcriptome, we show how groHMM can be used to analyze cell type-specific enhancers as defined by newly annotated enhancer transcripts. CONCLUSIONS: Our results show that groHMM can reveal new insights into cell type-specific transcription by identifying novel transcription units, and serve as a complete and useful tool for evaluating functional genomic elements in cells. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0656-3) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4502638
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-45026382015-07-16 groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data Chae, Minho Danko, Charles G. Kraus, W. Lee BMC Bioinformatics Software BACKGROUND: Global run-on coupled with deep sequencing (GRO-seq) provides extensive information on the location and function of coding and non-coding transcripts, including primary microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and enhancer RNAs (eRNAs), as well as yet undiscovered classes of transcripts. However, few computational tools tailored toward this new type of sequencing data are available, limiting the applicability of GRO-seq data for identifying novel transcription units. RESULTS: Here, we present groHMM, a computational tool in R, which defines the boundaries of transcription units de novo using a two state hidden-Markov model (HMM). A systematic comparison of the performance between groHMM and two existing peak-calling methods tuned to identify broad regions (SICER and HOMER) favorably supports our approach on existing GRO-seq data from MCF-7 breast cancer cells. To demonstrate the broader utility of our approach, we have used groHMM to annotate a diverse array of transcription units (i.e., primary transcripts) from four GRO-seq data sets derived from cells representing a variety of different human tissue types, including non-transformed cells (cardiomyocytes and lung fibroblasts) and transformed cells (LNCaP and MCF-7 cancer cells), as well as non-mammalian cells (from flies and worms). As an example of the utility of groHMM and its application to questions about the transcriptome, we show how groHMM can be used to analyze cell type-specific enhancers as defined by newly annotated enhancer transcripts. CONCLUSIONS: Our results show that groHMM can reveal new insights into cell type-specific transcription by identifying novel transcription units, and serve as a complete and useful tool for evaluating functional genomic elements in cells. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0656-3) contains supplementary material, which is available to authorized users. BioMed Central 2015-07-16 /pmc/articles/PMC4502638/ /pubmed/26173492 http://dx.doi.org/10.1186/s12859-015-0656-3 Text en © Chae et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Chae, Minho Danko, Charles G. Kraus, W. Lee groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data
title	groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data
title_full	groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data
title_fullStr	groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data
title_full_unstemmed	groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data
title_short	groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data
title_sort	grohmm: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4502638/ https://www.ncbi.nlm.nih.gov/pubmed/26173492 http://dx.doi.org/10.1186/s12859-015-0656-3
work_keys_str_mv	AT chaeminho grohmmacomputationaltoolforidentifyingunannotatedandcelltypespecifictranscriptionunitsfromglobalrunonsequencingdata AT dankocharlesg grohmmacomputationaltoolforidentifyingunannotatedandcelltypespecifictranscriptionunitsfromglobalrunonsequencingdata AT krauswlee grohmmacomputationaltoolforidentifyingunannotatedandcelltypespecifictranscriptionunitsfromglobalrunonsequencingdata

groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data

Ejemplares similares