Cargando…
Protein genes in repetitive sequence—antifreeze glycoproteins in Atlantic cod genome
BACKGROUND: Highly repetitive sequences are the bane of genome sequence assembly, and the short read lengths produced by current next generation sequencing technologies further exacerbates this obstacle. An adopted practice is to exclude repetitive sequences in genome data assembly, as the majority...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3441883/ https://www.ncbi.nlm.nih.gov/pubmed/22747999 http://dx.doi.org/10.1186/1471-2164-13-293 |
_version_ | 1782243396275929088 |
---|---|
author | Zhuang, Xuan Yang, Chun Fevolden, Svein-Erik Cheng, C-H Christina |
author_facet | Zhuang, Xuan Yang, Chun Fevolden, Svein-Erik Cheng, C-H Christina |
author_sort | Zhuang, Xuan |
collection | PubMed |
description | BACKGROUND: Highly repetitive sequences are the bane of genome sequence assembly, and the short read lengths produced by current next generation sequencing technologies further exacerbates this obstacle. An adopted practice is to exclude repetitive sequences in genome data assembly, as the majority of repeats lack protein-coding genes. However, this could result in the exclusion of important genotypes in newly sequenced non-model species. The absence of the antifreeze glycoproteins (AFGP) gene family in the recently sequenced Atlantic cod genome serves as an example. RESULTS: The Atlantic cod (Gadus morhua) genome was assembled entirely from Roche 454 short reads, demonstrating the feasibility of this approach. However, a well-known major adaptive trait, the AFGP, essential for survival in frigid Arctic marine habitats was absent in the annotated genome. To assess whether this resulted from population difference, we performed Southern blot analysis of genomic DNA from multiple individuals from the North East Arctic cod population that the sequenced cod belonged, and verified that the AFGP genotype is indeed present. We searched the raw assemblies of the Atlantic cod using our G. morhua AFGP gene, and located partial AFGP coding sequences in two sequence scaffolds. We found these two scaffolds constitute a partial genomic AFGP locus through comparative sequence analyses with our newly assembled genomic AFGP locus of the related polar cod, Boreogadus saida. By examining the sequence assembly and annotation methodologies used for the Atlantic cod genome, we deduced the primary cause of the absence of the AFGP gene family from the annotated genome was the removal of all repetitive Roche 454 short reads before sequence assembly, which would exclude most of the highly repetitive AFGP coding sequences. Secondarily, the model teleost genomes used in projection annotation of the Atlantic cod genome have no antifreeze trait, perpetuating the unawareness that the AFGP gene family is missing. CONCLUSIONS: We recovered some of the missing AFGP coding sequences and reconstructed a partial AFGP locus in the Atlantic cod genome, bringing to light that not all repetitive sequences lack protein coding information. Also, reliance on genomes of model organisms as reference for annotating protein-coding gene content of a newly sequenced non-model species could lead to omission of novel genetic traits. |
format | Online Article Text |
id | pubmed-3441883 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-34418832012-09-15 Protein genes in repetitive sequence—antifreeze glycoproteins in Atlantic cod genome Zhuang, Xuan Yang, Chun Fevolden, Svein-Erik Cheng, C-H Christina BMC Genomics Research Article BACKGROUND: Highly repetitive sequences are the bane of genome sequence assembly, and the short read lengths produced by current next generation sequencing technologies further exacerbates this obstacle. An adopted practice is to exclude repetitive sequences in genome data assembly, as the majority of repeats lack protein-coding genes. However, this could result in the exclusion of important genotypes in newly sequenced non-model species. The absence of the antifreeze glycoproteins (AFGP) gene family in the recently sequenced Atlantic cod genome serves as an example. RESULTS: The Atlantic cod (Gadus morhua) genome was assembled entirely from Roche 454 short reads, demonstrating the feasibility of this approach. However, a well-known major adaptive trait, the AFGP, essential for survival in frigid Arctic marine habitats was absent in the annotated genome. To assess whether this resulted from population difference, we performed Southern blot analysis of genomic DNA from multiple individuals from the North East Arctic cod population that the sequenced cod belonged, and verified that the AFGP genotype is indeed present. We searched the raw assemblies of the Atlantic cod using our G. morhua AFGP gene, and located partial AFGP coding sequences in two sequence scaffolds. We found these two scaffolds constitute a partial genomic AFGP locus through comparative sequence analyses with our newly assembled genomic AFGP locus of the related polar cod, Boreogadus saida. By examining the sequence assembly and annotation methodologies used for the Atlantic cod genome, we deduced the primary cause of the absence of the AFGP gene family from the annotated genome was the removal of all repetitive Roche 454 short reads before sequence assembly, which would exclude most of the highly repetitive AFGP coding sequences. Secondarily, the model teleost genomes used in projection annotation of the Atlantic cod genome have no antifreeze trait, perpetuating the unawareness that the AFGP gene family is missing. CONCLUSIONS: We recovered some of the missing AFGP coding sequences and reconstructed a partial AFGP locus in the Atlantic cod genome, bringing to light that not all repetitive sequences lack protein coding information. Also, reliance on genomes of model organisms as reference for annotating protein-coding gene content of a newly sequenced non-model species could lead to omission of novel genetic traits. BioMed Central 2012-07-02 /pmc/articles/PMC3441883/ /pubmed/22747999 http://dx.doi.org/10.1186/1471-2164-13-293 Text en Copyright ©2012 Zhuang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Zhuang, Xuan Yang, Chun Fevolden, Svein-Erik Cheng, C-H Christina Protein genes in repetitive sequence—antifreeze glycoproteins in Atlantic cod genome |
title | Protein genes in repetitive sequence—antifreeze glycoproteins in Atlantic cod genome |
title_full | Protein genes in repetitive sequence—antifreeze glycoproteins in Atlantic cod genome |
title_fullStr | Protein genes in repetitive sequence—antifreeze glycoproteins in Atlantic cod genome |
title_full_unstemmed | Protein genes in repetitive sequence—antifreeze glycoproteins in Atlantic cod genome |
title_short | Protein genes in repetitive sequence—antifreeze glycoproteins in Atlantic cod genome |
title_sort | protein genes in repetitive sequence—antifreeze glycoproteins in atlantic cod genome |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3441883/ https://www.ncbi.nlm.nih.gov/pubmed/22747999 http://dx.doi.org/10.1186/1471-2164-13-293 |
work_keys_str_mv | AT zhuangxuan proteingenesinrepetitivesequenceantifreezeglycoproteinsinatlanticcodgenome AT yangchun proteingenesinrepetitivesequenceantifreezeglycoproteinsinatlanticcodgenome AT fevoldensveinerik proteingenesinrepetitivesequenceantifreezeglycoproteinsinatlanticcodgenome AT chengchchristina proteingenesinrepetitivesequenceantifreezeglycoproteinsinatlanticcodgenome |