Cargando…

Tiling Assembly: a new tool for reference annotation-independent transcript assembly and novel gene identification by RNA-sequencing

Annotation of the rice (Oryza sativa) genome has evolved significantly since release of its draft sequence, but it is far from complete. Several published transcript assembly programmes were tested on RNA-sequencing (RNA-seq) data to determine their effectiveness in identifying novel genes to improv...

Descripción completa

Detalles Bibliográficos
Autores principales: Watanabe, Kenneth A., Homayouni, Arielle, Tufano, Tara, Lopez, Jennifer, Ringler, Patricia, Rushton, Paul, Shen, Qingxi J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4596398/
https://www.ncbi.nlm.nih.gov/pubmed/26341416
http://dx.doi.org/10.1093/dnares/dsv015
_version_ 1782393758736711680
author Watanabe, Kenneth A.
Homayouni, Arielle
Tufano, Tara
Lopez, Jennifer
Ringler, Patricia
Rushton, Paul
Shen, Qingxi J.
author_facet Watanabe, Kenneth A.
Homayouni, Arielle
Tufano, Tara
Lopez, Jennifer
Ringler, Patricia
Rushton, Paul
Shen, Qingxi J.
author_sort Watanabe, Kenneth A.
collection PubMed
description Annotation of the rice (Oryza sativa) genome has evolved significantly since release of its draft sequence, but it is far from complete. Several published transcript assembly programmes were tested on RNA-sequencing (RNA-seq) data to determine their effectiveness in identifying novel genes to improve the rice genome annotation. Cufflinks, a popular assembly software, did not identify all transcripts suggested by the RNA-seq data. Other assembly software was CPU intensive, lacked documentation, or lacked software updates. To overcome these shortcomings, a heuristic ab initio transcript assembly algorithm, Tiling Assembly, was developed to identify genes based on short read and junction alignment. Tiling Assembly was compared with Cufflinks to evaluate its gene-finding capabilities. Additionally, a pipeline was developed to eliminate false-positive gene identification due to noise or repetitive regions in the genome. By combining Tiling Assembly and Cufflinks, 767 unannotated genes were identified in the rice genome, demonstrating that combining both programmes proved highly efficient for novel gene identification. We also demonstrated that Tiling Assembly can accurately determine transcription start sites by comparing the Tiling Assembly genes with their corresponding full-length cDNA. We applied our pipeline to additional organisms and identified numerous unannotated genes, demonstrating that Tiling Assembly is an organism-independent tool for genome annotation.
format Online
Article
Text
id pubmed-4596398
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-45963982015-10-09 Tiling Assembly: a new tool for reference annotation-independent transcript assembly and novel gene identification by RNA-sequencing Watanabe, Kenneth A. Homayouni, Arielle Tufano, Tara Lopez, Jennifer Ringler, Patricia Rushton, Paul Shen, Qingxi J. DNA Res Full Papers Annotation of the rice (Oryza sativa) genome has evolved significantly since release of its draft sequence, but it is far from complete. Several published transcript assembly programmes were tested on RNA-sequencing (RNA-seq) data to determine their effectiveness in identifying novel genes to improve the rice genome annotation. Cufflinks, a popular assembly software, did not identify all transcripts suggested by the RNA-seq data. Other assembly software was CPU intensive, lacked documentation, or lacked software updates. To overcome these shortcomings, a heuristic ab initio transcript assembly algorithm, Tiling Assembly, was developed to identify genes based on short read and junction alignment. Tiling Assembly was compared with Cufflinks to evaluate its gene-finding capabilities. Additionally, a pipeline was developed to eliminate false-positive gene identification due to noise or repetitive regions in the genome. By combining Tiling Assembly and Cufflinks, 767 unannotated genes were identified in the rice genome, demonstrating that combining both programmes proved highly efficient for novel gene identification. We also demonstrated that Tiling Assembly can accurately determine transcription start sites by comparing the Tiling Assembly genes with their corresponding full-length cDNA. We applied our pipeline to additional organisms and identified numerous unannotated genes, demonstrating that Tiling Assembly is an organism-independent tool for genome annotation. Oxford University Press 2015-10 2015-09-03 /pmc/articles/PMC4596398/ /pubmed/26341416 http://dx.doi.org/10.1093/dnares/dsv015 Text en © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Full Papers
Watanabe, Kenneth A.
Homayouni, Arielle
Tufano, Tara
Lopez, Jennifer
Ringler, Patricia
Rushton, Paul
Shen, Qingxi J.
Tiling Assembly: a new tool for reference annotation-independent transcript assembly and novel gene identification by RNA-sequencing
title Tiling Assembly: a new tool for reference annotation-independent transcript assembly and novel gene identification by RNA-sequencing
title_full Tiling Assembly: a new tool for reference annotation-independent transcript assembly and novel gene identification by RNA-sequencing
title_fullStr Tiling Assembly: a new tool for reference annotation-independent transcript assembly and novel gene identification by RNA-sequencing
title_full_unstemmed Tiling Assembly: a new tool for reference annotation-independent transcript assembly and novel gene identification by RNA-sequencing
title_short Tiling Assembly: a new tool for reference annotation-independent transcript assembly and novel gene identification by RNA-sequencing
title_sort tiling assembly: a new tool for reference annotation-independent transcript assembly and novel gene identification by rna-sequencing
topic Full Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4596398/
https://www.ncbi.nlm.nih.gov/pubmed/26341416
http://dx.doi.org/10.1093/dnares/dsv015
work_keys_str_mv AT watanabekennetha tilingassemblyanewtoolforreferenceannotationindependenttranscriptassemblyandnovelgeneidentificationbyrnasequencing
AT homayouniarielle tilingassemblyanewtoolforreferenceannotationindependenttranscriptassemblyandnovelgeneidentificationbyrnasequencing
AT tufanotara tilingassemblyanewtoolforreferenceannotationindependenttranscriptassemblyandnovelgeneidentificationbyrnasequencing
AT lopezjennifer tilingassemblyanewtoolforreferenceannotationindependenttranscriptassemblyandnovelgeneidentificationbyrnasequencing
AT ringlerpatricia tilingassemblyanewtoolforreferenceannotationindependenttranscriptassemblyandnovelgeneidentificationbyrnasequencing
AT rushtonpaul tilingassemblyanewtoolforreferenceannotationindependenttranscriptassemblyandnovelgeneidentificationbyrnasequencing
AT shenqingxij tilingassemblyanewtoolforreferenceannotationindependenttranscriptassemblyandnovelgeneidentificationbyrnasequencing