Cargando…

Identifying novel genes in C. elegans using SAGE tags

BACKGROUND: Despite extensive efforts devoted to predicting protein-coding genes in genome sequences, many bona fide genes have not been found and many existing gene models are not accurate in all sequenced eukaryote genomes. This situation is partly explained by the fact that gene prediction progra...

Descripción completa

Detalles Bibliográficos
Autores principales: Nesbitt, Matthew J, Moerman, Donald G, Chen, Nansheng
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3017025/
https://www.ncbi.nlm.nih.gov/pubmed/21143975
http://dx.doi.org/10.1186/1471-2199-11-96
_version_ 1782195863891738624
author Nesbitt, Matthew J
Moerman, Donald G
Chen, Nansheng
author_facet Nesbitt, Matthew J
Moerman, Donald G
Chen, Nansheng
author_sort Nesbitt, Matthew J
collection PubMed
description BACKGROUND: Despite extensive efforts devoted to predicting protein-coding genes in genome sequences, many bona fide genes have not been found and many existing gene models are not accurate in all sequenced eukaryote genomes. This situation is partly explained by the fact that gene prediction programs have been developed based on our incomplete understanding of gene feature information such as splicing and promoter characteristics. Additionally, full-length cDNAs of many genes and their isoforms are hard to obtain due to their low level or rare expression. In order to obtain full-length sequences of all protein-coding genes, alternative approaches are required. RESULTS: In this project, we have developed a method of reconstructing full-length cDNA sequences based on short expressed sequence tags which is called sequence tag-based amplification of cDNA ends (STACE). Expressed tags are used as anchors for retrieving full-length transcripts in two rounds of PCR amplification. We have demonstrated the application of STACE in reconstructing full-length cDNA sequences using expressed tags mined in an array of serial analysis of gene expression (SAGE) of C. elegans cDNA libraries. We have successfully applied STACE to recover sequence information for 12 genes, for two of which we found isoforms. STACE was used to successfully recover full-length cDNA sequences for seven of these genes. CONCLUSIONS: The STACE method can be used to effectively reconstruct full-length cDNA sequences of genes that are under-represented in cDNA sequencing projects and have been missed by existing gene prediction methods, but their existence has been suggested by short sequence tags such as SAGE tags.
format Text
id pubmed-3017025
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30170252011-01-07 Identifying novel genes in C. elegans using SAGE tags Nesbitt, Matthew J Moerman, Donald G Chen, Nansheng BMC Mol Biol Research Article BACKGROUND: Despite extensive efforts devoted to predicting protein-coding genes in genome sequences, many bona fide genes have not been found and many existing gene models are not accurate in all sequenced eukaryote genomes. This situation is partly explained by the fact that gene prediction programs have been developed based on our incomplete understanding of gene feature information such as splicing and promoter characteristics. Additionally, full-length cDNAs of many genes and their isoforms are hard to obtain due to their low level or rare expression. In order to obtain full-length sequences of all protein-coding genes, alternative approaches are required. RESULTS: In this project, we have developed a method of reconstructing full-length cDNA sequences based on short expressed sequence tags which is called sequence tag-based amplification of cDNA ends (STACE). Expressed tags are used as anchors for retrieving full-length transcripts in two rounds of PCR amplification. We have demonstrated the application of STACE in reconstructing full-length cDNA sequences using expressed tags mined in an array of serial analysis of gene expression (SAGE) of C. elegans cDNA libraries. We have successfully applied STACE to recover sequence information for 12 genes, for two of which we found isoforms. STACE was used to successfully recover full-length cDNA sequences for seven of these genes. CONCLUSIONS: The STACE method can be used to effectively reconstruct full-length cDNA sequences of genes that are under-represented in cDNA sequencing projects and have been missed by existing gene prediction methods, but their existence has been suggested by short sequence tags such as SAGE tags. BioMed Central 2010-12-10 /pmc/articles/PMC3017025/ /pubmed/21143975 http://dx.doi.org/10.1186/1471-2199-11-96 Text en Copyright ©2010 Nesbitt et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Nesbitt, Matthew J
Moerman, Donald G
Chen, Nansheng
Identifying novel genes in C. elegans using SAGE tags
title Identifying novel genes in C. elegans using SAGE tags
title_full Identifying novel genes in C. elegans using SAGE tags
title_fullStr Identifying novel genes in C. elegans using SAGE tags
title_full_unstemmed Identifying novel genes in C. elegans using SAGE tags
title_short Identifying novel genes in C. elegans using SAGE tags
title_sort identifying novel genes in c. elegans using sage tags
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3017025/
https://www.ncbi.nlm.nih.gov/pubmed/21143975
http://dx.doi.org/10.1186/1471-2199-11-96
work_keys_str_mv AT nesbittmatthewj identifyingnovelgenesincelegansusingsagetags
AT moermandonaldg identifyingnovelgenesincelegansusingsagetags
AT chennansheng identifyingnovelgenesincelegansusingsagetags