Cargando…

Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome

High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Meili, Hu, Yibo, Liu, Jingxing, Wu, Qi, Zhang, Chenglin, Yu, Jun, Xiao, Jingfa, Wei, Fuwen, Wu, Jiayan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4676012/
https://www.ncbi.nlm.nih.gov/pubmed/26658305
http://dx.doi.org/10.1038/srep18019
_version_ 1782405092954079232
author Chen, Meili
Hu, Yibo
Liu, Jingxing
Wu, Qi
Zhang, Chenglin
Yu, Jun
Xiao, Jingfa
Wei, Fuwen
Wu, Jiayan
author_facet Chen, Meili
Hu, Yibo
Liu, Jingxing
Wu, Qi
Zhang, Chenglin
Yu, Jun
Xiao, Jingfa
Wei, Fuwen
Wu, Jiayan
author_sort Chen, Meili
collection PubMed
description High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.
format Online
Article
Text
id pubmed-4676012
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-46760122015-12-16 Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome Chen, Meili Hu, Yibo Liu, Jingxing Wu, Qi Zhang, Chenglin Yu, Jun Xiao, Jingfa Wei, Fuwen Wu, Jiayan Sci Rep Article High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives. Nature Publishing Group 2015-12-11 /pmc/articles/PMC4676012/ /pubmed/26658305 http://dx.doi.org/10.1038/srep18019 Text en Copyright © 2015, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Chen, Meili
Hu, Yibo
Liu, Jingxing
Wu, Qi
Zhang, Chenglin
Yu, Jun
Xiao, Jingfa
Wei, Fuwen
Wu, Jiayan
Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome
title Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome
title_full Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome
title_fullStr Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome
title_full_unstemmed Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome
title_short Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome
title_sort improvement of genome assembly completeness and identification of novel full-length protein-coding genes by rna-seq in the giant panda genome
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4676012/
https://www.ncbi.nlm.nih.gov/pubmed/26658305
http://dx.doi.org/10.1038/srep18019
work_keys_str_mv AT chenmeili improvementofgenomeassemblycompletenessandidentificationofnovelfulllengthproteincodinggenesbyrnaseqinthegiantpandagenome
AT huyibo improvementofgenomeassemblycompletenessandidentificationofnovelfulllengthproteincodinggenesbyrnaseqinthegiantpandagenome
AT liujingxing improvementofgenomeassemblycompletenessandidentificationofnovelfulllengthproteincodinggenesbyrnaseqinthegiantpandagenome
AT wuqi improvementofgenomeassemblycompletenessandidentificationofnovelfulllengthproteincodinggenesbyrnaseqinthegiantpandagenome
AT zhangchenglin improvementofgenomeassemblycompletenessandidentificationofnovelfulllengthproteincodinggenesbyrnaseqinthegiantpandagenome
AT yujun improvementofgenomeassemblycompletenessandidentificationofnovelfulllengthproteincodinggenesbyrnaseqinthegiantpandagenome
AT xiaojingfa improvementofgenomeassemblycompletenessandidentificationofnovelfulllengthproteincodinggenesbyrnaseqinthegiantpandagenome
AT weifuwen improvementofgenomeassemblycompletenessandidentificationofnovelfulllengthproteincodinggenesbyrnaseqinthegiantpandagenome
AT wujiayan improvementofgenomeassemblycompletenessandidentificationofnovelfulllengthproteincodinggenesbyrnaseqinthegiantpandagenome