Cargando…

RNA-Seq improves annotation of protein-coding genes in the cucumber genome

BACKGROUND: As more and more genomes are sequenced, genome annotation becomes increasingly important in bridging the gap between sequence and biology. Gene prediction, which is at the center of genome annotation, usually integrates various resources to compute consensus gene structures. However, man...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Zhen, Zhang, Zhonghua, Yan, Pengcheng, Huang, Sanwen, Fei, Zhangjun, Lin, Kui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3219749/
https://www.ncbi.nlm.nih.gov/pubmed/22047402
http://dx.doi.org/10.1186/1471-2164-12-540
_version_ 1782216891514748928
author Li, Zhen
Zhang, Zhonghua
Yan, Pengcheng
Huang, Sanwen
Fei, Zhangjun
Lin, Kui
author_facet Li, Zhen
Zhang, Zhonghua
Yan, Pengcheng
Huang, Sanwen
Fei, Zhangjun
Lin, Kui
author_sort Li, Zhen
collection PubMed
description BACKGROUND: As more and more genomes are sequenced, genome annotation becomes increasingly important in bridging the gap between sequence and biology. Gene prediction, which is at the center of genome annotation, usually integrates various resources to compute consensus gene structures. However, many newly sequenced genomes have limited resources for gene predictions. In an effort to create high-quality gene models of the cucumber genome (Cucumis sativus var. sativus), based on the EVidenceModeler gene prediction pipeline, we incorporated the massively parallel complementary DNA sequencing (RNA-Seq) reads of 10 cucumber tissues into EVidenceModeler. We applied the new pipeline to the reassembled cucumber genome and included a comparison between our predicted protein-coding gene sets and a published set. RESULTS: The reassembled cucumber genome, annotated with RNA-Seq reads from 10 tissues, has 23, 248 identified protein-coding genes. Compared with the published prediction in 2009, approximately 8, 700 genes reveal structural modifications and 5, 285 genes only appear in the reassembled cucumber genome. All the related results, including genome sequence and annotations, are available at http://cmb.bnu.edu.cn/Cucumis_sativus_v20/. CONCLUSIONS: We conclude that RNA-Seq greatly improves the accuracy of prediction of protein-coding genes in the reassembled cucumber genome. The comparison between the two gene sets also suggests that it is feasible to use RNA-Seq reads to annotate newly sequenced or less-studied genomes.
format Online
Article
Text
id pubmed-3219749
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32197492011-11-18 RNA-Seq improves annotation of protein-coding genes in the cucumber genome Li, Zhen Zhang, Zhonghua Yan, Pengcheng Huang, Sanwen Fei, Zhangjun Lin, Kui BMC Genomics Research Article BACKGROUND: As more and more genomes are sequenced, genome annotation becomes increasingly important in bridging the gap between sequence and biology. Gene prediction, which is at the center of genome annotation, usually integrates various resources to compute consensus gene structures. However, many newly sequenced genomes have limited resources for gene predictions. In an effort to create high-quality gene models of the cucumber genome (Cucumis sativus var. sativus), based on the EVidenceModeler gene prediction pipeline, we incorporated the massively parallel complementary DNA sequencing (RNA-Seq) reads of 10 cucumber tissues into EVidenceModeler. We applied the new pipeline to the reassembled cucumber genome and included a comparison between our predicted protein-coding gene sets and a published set. RESULTS: The reassembled cucumber genome, annotated with RNA-Seq reads from 10 tissues, has 23, 248 identified protein-coding genes. Compared with the published prediction in 2009, approximately 8, 700 genes reveal structural modifications and 5, 285 genes only appear in the reassembled cucumber genome. All the related results, including genome sequence and annotations, are available at http://cmb.bnu.edu.cn/Cucumis_sativus_v20/. CONCLUSIONS: We conclude that RNA-Seq greatly improves the accuracy of prediction of protein-coding genes in the reassembled cucumber genome. The comparison between the two gene sets also suggests that it is feasible to use RNA-Seq reads to annotate newly sequenced or less-studied genomes. BioMed Central 2011-11-02 /pmc/articles/PMC3219749/ /pubmed/22047402 http://dx.doi.org/10.1186/1471-2164-12-540 Text en Copyright ©2011 Li et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Li, Zhen
Zhang, Zhonghua
Yan, Pengcheng
Huang, Sanwen
Fei, Zhangjun
Lin, Kui
RNA-Seq improves annotation of protein-coding genes in the cucumber genome
title RNA-Seq improves annotation of protein-coding genes in the cucumber genome
title_full RNA-Seq improves annotation of protein-coding genes in the cucumber genome
title_fullStr RNA-Seq improves annotation of protein-coding genes in the cucumber genome
title_full_unstemmed RNA-Seq improves annotation of protein-coding genes in the cucumber genome
title_short RNA-Seq improves annotation of protein-coding genes in the cucumber genome
title_sort rna-seq improves annotation of protein-coding genes in the cucumber genome
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3219749/
https://www.ncbi.nlm.nih.gov/pubmed/22047402
http://dx.doi.org/10.1186/1471-2164-12-540
work_keys_str_mv AT lizhen rnaseqimprovesannotationofproteincodinggenesinthecucumbergenome
AT zhangzhonghua rnaseqimprovesannotationofproteincodinggenesinthecucumbergenome
AT yanpengcheng rnaseqimprovesannotationofproteincodinggenesinthecucumbergenome
AT huangsanwen rnaseqimprovesannotationofproteincodinggenesinthecucumbergenome
AT feizhangjun rnaseqimprovesannotationofproteincodinggenesinthecucumbergenome
AT linkui rnaseqimprovesannotationofproteincodinggenesinthecucumbergenome