Cargando…

Efficient assembly and annotation of the transcriptome of catfish by RNA-Seq analysis of a doubled haploid homozygote

BACKGROUND: Upon the completion of whole genome sequencing, thorough genome annotation that associates genome sequences with biological meanings is essential. Genome annotation depends on the availability of transcript information as well as orthology information. In teleost fish, genome annotation...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Shikai, Zhang, Yu, Zhou, Zunchun, Waldbieser, Geoff, Sun, Fanyue, Lu, Jianguo, Zhang, Jiaren, Jiang, Yanliang, Zhang, Hao, Wang, Xiuli, Rajendran, KV, Khoo, Lester, Kucuktas, Huseyin, Peatman, Eric, Liu, Zhanjiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3582483/
https://www.ncbi.nlm.nih.gov/pubmed/23127152
http://dx.doi.org/10.1186/1471-2164-13-595
_version_ 1782260574787207168
author Liu, Shikai
Zhang, Yu
Zhou, Zunchun
Waldbieser, Geoff
Sun, Fanyue
Lu, Jianguo
Zhang, Jiaren
Jiang, Yanliang
Zhang, Hao
Wang, Xiuli
Rajendran, KV
Khoo, Lester
Kucuktas, Huseyin
Peatman, Eric
Liu, Zhanjiang
author_facet Liu, Shikai
Zhang, Yu
Zhou, Zunchun
Waldbieser, Geoff
Sun, Fanyue
Lu, Jianguo
Zhang, Jiaren
Jiang, Yanliang
Zhang, Hao
Wang, Xiuli
Rajendran, KV
Khoo, Lester
Kucuktas, Huseyin
Peatman, Eric
Liu, Zhanjiang
author_sort Liu, Shikai
collection PubMed
description BACKGROUND: Upon the completion of whole genome sequencing, thorough genome annotation that associates genome sequences with biological meanings is essential. Genome annotation depends on the availability of transcript information as well as orthology information. In teleost fish, genome annotation is seriously hindered by genome duplication. Because of gene duplications, one cannot establish orthologies simply by homology comparisons. Rather intense phylogenetic analysis or structural analysis of orthologies is required for the identification of genes. To conduct phylogenetic analysis and orthology analysis, full-length transcripts are essential. Generation of large numbers of full-length transcripts using traditional transcript sequencing is very difficult and extremely costly. RESULTS: In this work, we took advantage of a doubled haploid catfish, which has two sets of identical chromosomes and in theory there should be no allelic variations. As such, transcript sequences generated from next-generation sequencing can be favorably assembled into full-length transcripts. Deep sequencing of the doubled haploid channel catfish transcriptome was performed using Illumina HiSeq 2000 platform, yielding over 300 million high-quality trimmed reads totaling 27 Gbp. Assembly of these reads generated 370,798 non-redundant transcript-derived contigs. Functional annotation of the assembly allowed identification of 25,144 unique protein-encoding genes. A total of 2,659 unique genes were identified as putative duplicated genes in the catfish genome because the assembly of the corresponding transcripts harbored PSVs or MSVs (in the form of pseudo-SNPs in the assembly). Of the 25,144 contigs with unique protein hits, around 20,000 contigs matched 50% length of reference proteins, and over 14,000 transcripts were identified as full-length with complete open reading frames. The characterization of consensus sequences surrounding start codon and the stop codon confirmed the correct assembly of the full-length transcripts. CONCLUSIONS: The large set of transcripts assembled in this study is the most comprehensive set of genome resources ever developed from catfish, which will provide the much needed resources for functional genome research in catfish, serving as a reference transcriptome for genome annotation, analysis of gene duplication, gene family structures, and digital gene expression analysis. The putative set of duplicated genes provide a starting point for genome scale analysis of gene duplication in the catfish genome, and should be a valuable resource for comparative genome analysis, genome evolution, and genome function studies.
format Online
Article
Text
id pubmed-3582483
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35824832013-02-27 Efficient assembly and annotation of the transcriptome of catfish by RNA-Seq analysis of a doubled haploid homozygote Liu, Shikai Zhang, Yu Zhou, Zunchun Waldbieser, Geoff Sun, Fanyue Lu, Jianguo Zhang, Jiaren Jiang, Yanliang Zhang, Hao Wang, Xiuli Rajendran, KV Khoo, Lester Kucuktas, Huseyin Peatman, Eric Liu, Zhanjiang BMC Genomics Research Article BACKGROUND: Upon the completion of whole genome sequencing, thorough genome annotation that associates genome sequences with biological meanings is essential. Genome annotation depends on the availability of transcript information as well as orthology information. In teleost fish, genome annotation is seriously hindered by genome duplication. Because of gene duplications, one cannot establish orthologies simply by homology comparisons. Rather intense phylogenetic analysis or structural analysis of orthologies is required for the identification of genes. To conduct phylogenetic analysis and orthology analysis, full-length transcripts are essential. Generation of large numbers of full-length transcripts using traditional transcript sequencing is very difficult and extremely costly. RESULTS: In this work, we took advantage of a doubled haploid catfish, which has two sets of identical chromosomes and in theory there should be no allelic variations. As such, transcript sequences generated from next-generation sequencing can be favorably assembled into full-length transcripts. Deep sequencing of the doubled haploid channel catfish transcriptome was performed using Illumina HiSeq 2000 platform, yielding over 300 million high-quality trimmed reads totaling 27 Gbp. Assembly of these reads generated 370,798 non-redundant transcript-derived contigs. Functional annotation of the assembly allowed identification of 25,144 unique protein-encoding genes. A total of 2,659 unique genes were identified as putative duplicated genes in the catfish genome because the assembly of the corresponding transcripts harbored PSVs or MSVs (in the form of pseudo-SNPs in the assembly). Of the 25,144 contigs with unique protein hits, around 20,000 contigs matched 50% length of reference proteins, and over 14,000 transcripts were identified as full-length with complete open reading frames. The characterization of consensus sequences surrounding start codon and the stop codon confirmed the correct assembly of the full-length transcripts. CONCLUSIONS: The large set of transcripts assembled in this study is the most comprehensive set of genome resources ever developed from catfish, which will provide the much needed resources for functional genome research in catfish, serving as a reference transcriptome for genome annotation, analysis of gene duplication, gene family structures, and digital gene expression analysis. The putative set of duplicated genes provide a starting point for genome scale analysis of gene duplication in the catfish genome, and should be a valuable resource for comparative genome analysis, genome evolution, and genome function studies. BioMed Central 2012-11-05 /pmc/articles/PMC3582483/ /pubmed/23127152 http://dx.doi.org/10.1186/1471-2164-13-595 Text en Copyright ©2013 Liu et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Liu, Shikai
Zhang, Yu
Zhou, Zunchun
Waldbieser, Geoff
Sun, Fanyue
Lu, Jianguo
Zhang, Jiaren
Jiang, Yanliang
Zhang, Hao
Wang, Xiuli
Rajendran, KV
Khoo, Lester
Kucuktas, Huseyin
Peatman, Eric
Liu, Zhanjiang
Efficient assembly and annotation of the transcriptome of catfish by RNA-Seq analysis of a doubled haploid homozygote
title Efficient assembly and annotation of the transcriptome of catfish by RNA-Seq analysis of a doubled haploid homozygote
title_full Efficient assembly and annotation of the transcriptome of catfish by RNA-Seq analysis of a doubled haploid homozygote
title_fullStr Efficient assembly and annotation of the transcriptome of catfish by RNA-Seq analysis of a doubled haploid homozygote
title_full_unstemmed Efficient assembly and annotation of the transcriptome of catfish by RNA-Seq analysis of a doubled haploid homozygote
title_short Efficient assembly and annotation of the transcriptome of catfish by RNA-Seq analysis of a doubled haploid homozygote
title_sort efficient assembly and annotation of the transcriptome of catfish by rna-seq analysis of a doubled haploid homozygote
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3582483/
https://www.ncbi.nlm.nih.gov/pubmed/23127152
http://dx.doi.org/10.1186/1471-2164-13-595
work_keys_str_mv AT liushikai efficientassemblyandannotationofthetranscriptomeofcatfishbyrnaseqanalysisofadoubledhaploidhomozygote
AT zhangyu efficientassemblyandannotationofthetranscriptomeofcatfishbyrnaseqanalysisofadoubledhaploidhomozygote
AT zhouzunchun efficientassemblyandannotationofthetranscriptomeofcatfishbyrnaseqanalysisofadoubledhaploidhomozygote
AT waldbiesergeoff efficientassemblyandannotationofthetranscriptomeofcatfishbyrnaseqanalysisofadoubledhaploidhomozygote
AT sunfanyue efficientassemblyandannotationofthetranscriptomeofcatfishbyrnaseqanalysisofadoubledhaploidhomozygote
AT lujianguo efficientassemblyandannotationofthetranscriptomeofcatfishbyrnaseqanalysisofadoubledhaploidhomozygote
AT zhangjiaren efficientassemblyandannotationofthetranscriptomeofcatfishbyrnaseqanalysisofadoubledhaploidhomozygote
AT jiangyanliang efficientassemblyandannotationofthetranscriptomeofcatfishbyrnaseqanalysisofadoubledhaploidhomozygote
AT zhanghao efficientassemblyandannotationofthetranscriptomeofcatfishbyrnaseqanalysisofadoubledhaploidhomozygote
AT wangxiuli efficientassemblyandannotationofthetranscriptomeofcatfishbyrnaseqanalysisofadoubledhaploidhomozygote
AT rajendrankv efficientassemblyandannotationofthetranscriptomeofcatfishbyrnaseqanalysisofadoubledhaploidhomozygote
AT khoolester efficientassemblyandannotationofthetranscriptomeofcatfishbyrnaseqanalysisofadoubledhaploidhomozygote
AT kucuktashuseyin efficientassemblyandannotationofthetranscriptomeofcatfishbyrnaseqanalysisofadoubledhaploidhomozygote
AT peatmaneric efficientassemblyandannotationofthetranscriptomeofcatfishbyrnaseqanalysisofadoubledhaploidhomozygote
AT liuzhanjiang efficientassemblyandannotationofthetranscriptomeofcatfishbyrnaseqanalysisofadoubledhaploidhomozygote