Cargando…
Genome-guided transcript assembly from integrative analysis of RNA sequence data
The identification of full length transcripts entirely from short-read RNA sequencing data (RNA-seq) remains a challenge in genome annotation pipelines. Here we describe an automated pipeline for genome annotation that integrates RNA-seq and gene-boundary data sets, which we call generalized RNA int...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4037530/ https://www.ncbi.nlm.nih.gov/pubmed/24633242 http://dx.doi.org/10.1038/nbt.2850 |
_version_ | 1782318245852741632 |
---|---|
author | Boley, Nathan Stoiber, Marcus H. Booth, Benjamin W. Wan, Kenneth H. Hoskins, Roger A. Bickel, Peter J. Celniker, Susan E. Brown, James B. |
author_facet | Boley, Nathan Stoiber, Marcus H. Booth, Benjamin W. Wan, Kenneth H. Hoskins, Roger A. Bickel, Peter J. Celniker, Susan E. Brown, James B. |
author_sort | Boley, Nathan |
collection | PubMed |
description | The identification of full length transcripts entirely from short-read RNA sequencing data (RNA-seq) remains a challenge in genome annotation pipelines. Here we describe an automated pipeline for genome annotation that integrates RNA-seq and gene-boundary data sets, which we call generalized RNA integration tool, or GRIT. By applying GRIT to Drosophila melanogaster short-read RNA-seq, cap analysis of gene expression (CAGE) and poly(A)-site-seq data collected for the modENCODE project, we recover the vast majority of previously annotated transcripts and double the total number of transcripts cataloged. We find that 20% of protein coding genes encode multiple protein-localization signals, and that, in 20 day old adult fly heads, genes with multiple poly-adenylation sites are more common than genes with alternate splicing or alternate promoters. When compared to the most widely used transcript assembly tools, GRIT recovers a larger fraction of annotated transcripts at higher precision. GRIT will enable the automated generation of high-quality genome annotations without necessitating extensive manual annotation. |
format | Online Article Text |
id | pubmed-4037530 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
record_format | MEDLINE/PubMed |
spelling | pubmed-40375302014-10-01 Genome-guided transcript assembly from integrative analysis of RNA sequence data Boley, Nathan Stoiber, Marcus H. Booth, Benjamin W. Wan, Kenneth H. Hoskins, Roger A. Bickel, Peter J. Celniker, Susan E. Brown, James B. Nat Biotechnol Article The identification of full length transcripts entirely from short-read RNA sequencing data (RNA-seq) remains a challenge in genome annotation pipelines. Here we describe an automated pipeline for genome annotation that integrates RNA-seq and gene-boundary data sets, which we call generalized RNA integration tool, or GRIT. By applying GRIT to Drosophila melanogaster short-read RNA-seq, cap analysis of gene expression (CAGE) and poly(A)-site-seq data collected for the modENCODE project, we recover the vast majority of previously annotated transcripts and double the total number of transcripts cataloged. We find that 20% of protein coding genes encode multiple protein-localization signals, and that, in 20 day old adult fly heads, genes with multiple poly-adenylation sites are more common than genes with alternate splicing or alternate promoters. When compared to the most widely used transcript assembly tools, GRIT recovers a larger fraction of annotated transcripts at higher precision. GRIT will enable the automated generation of high-quality genome annotations without necessitating extensive manual annotation. 2014-03-16 2014-04 /pmc/articles/PMC4037530/ /pubmed/24633242 http://dx.doi.org/10.1038/nbt.2850 Text en http://www.nature.com/authors/editorial_policies/license.html#terms Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms |
spellingShingle | Article Boley, Nathan Stoiber, Marcus H. Booth, Benjamin W. Wan, Kenneth H. Hoskins, Roger A. Bickel, Peter J. Celniker, Susan E. Brown, James B. Genome-guided transcript assembly from integrative analysis of RNA sequence data |
title | Genome-guided transcript assembly from integrative analysis of RNA sequence data |
title_full | Genome-guided transcript assembly from integrative analysis of RNA sequence data |
title_fullStr | Genome-guided transcript assembly from integrative analysis of RNA sequence data |
title_full_unstemmed | Genome-guided transcript assembly from integrative analysis of RNA sequence data |
title_short | Genome-guided transcript assembly from integrative analysis of RNA sequence data |
title_sort | genome-guided transcript assembly from integrative analysis of rna sequence data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4037530/ https://www.ncbi.nlm.nih.gov/pubmed/24633242 http://dx.doi.org/10.1038/nbt.2850 |
work_keys_str_mv | AT boleynathan genomeguidedtranscriptassemblyfromintegrativeanalysisofrnasequencedata AT stoibermarcush genomeguidedtranscriptassemblyfromintegrativeanalysisofrnasequencedata AT boothbenjaminw genomeguidedtranscriptassemblyfromintegrativeanalysisofrnasequencedata AT wankennethh genomeguidedtranscriptassemblyfromintegrativeanalysisofrnasequencedata AT hoskinsrogera genomeguidedtranscriptassemblyfromintegrativeanalysisofrnasequencedata AT bickelpeterj genomeguidedtranscriptassemblyfromintegrativeanalysisofrnasequencedata AT celnikersusane genomeguidedtranscriptassemblyfromintegrativeanalysisofrnasequencedata AT brownjamesb genomeguidedtranscriptassemblyfromintegrativeanalysisofrnasequencedata |