Cargando…

Characterization of 954 bovine full-CDS cDNA sequences

BACKGROUND: Genome assemblies rely on the existence of transcript sequence to stitch together contigs, verify assembly of whole genome shotgun reads, and annotate genes. Functional genomics studies also rely on transcript sequence to create expression microarrays or interpret digital tag data produc...

Descripción completa

Detalles Bibliográficos
Autores principales: Harhay, Gregory P, Sonstegard, Tad S, Keele, John W, Heaton, Michael P, Clawson, Michael L, Snelling, Warren M, Wiedmann, Ralph T, Van Tassell, Curt P, Smith, Timothy PL
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1314900/
https://www.ncbi.nlm.nih.gov/pubmed/16305752
http://dx.doi.org/10.1186/1471-2164-6-166
_version_ 1782126352977100800
author Harhay, Gregory P
Sonstegard, Tad S
Keele, John W
Heaton, Michael P
Clawson, Michael L
Snelling, Warren M
Wiedmann, Ralph T
Van Tassell, Curt P
Smith, Timothy PL
author_facet Harhay, Gregory P
Sonstegard, Tad S
Keele, John W
Heaton, Michael P
Clawson, Michael L
Snelling, Warren M
Wiedmann, Ralph T
Van Tassell, Curt P
Smith, Timothy PL
author_sort Harhay, Gregory P
collection PubMed
description BACKGROUND: Genome assemblies rely on the existence of transcript sequence to stitch together contigs, verify assembly of whole genome shotgun reads, and annotate genes. Functional genomics studies also rely on transcript sequence to create expression microarrays or interpret digital tag data produced by methods such as Serial Analysis of Gene Expression (SAGE). Transcript sequence can be predicted based on reconstruction from overlapping expressed sequence tags (EST) that are obtained by single-pass sequencing of random cDNA clones, but these reconstructions are prone to errors caused by alternative splice forms, transcripts from gene families with related sequences, and expressed pseudogenes. These errors confound genome assembly and annotation. The most useful transcript sequences are derived by complete insert sequencing of clones containing the entire length, or at least the full protein coding sequence (CDS) portion, of the source mRNA. While the bovine genome sequencing initiative is nearing completion, there is currently a paucity of bovine full-CDS mRNA and protein sequence data to support bovine genome assembly and functional genomics studies. Consequently, the production of high-quality bovine full-CDS cDNA sequences will enhance the bovine genome assembly and functional studies of bovine genes and gene products. The goal of this investigation was to identify and characterize the full-CDS sequences of bovine transcripts from clones identified in non-full-length enriched cDNA libraries. In contrast to several recent full-length cDNA investigations, these full-CDS cDNAs were selected, sequenced, and annotated without the benefit of the target organism's genomic sequence, by using comparison of bovine EST sequence to existing human mRNA to identify likely full-CDS clones for full-length insert cDNA (FLIC) sequencing. RESULTS: The predicted bovine protein lengths, 5' UTR lengths, and Kozak consensus sequences from 954 bovine FLIC sequences (bFLICs; average length 1713 nt, representing 762 distinct loci) are all consistent with previously sequenced mammalian full-length transcripts. CONCLUSION: In most cases, the bFLICs span the entire CDS of the genes, providing the basis for creating predicted bovine protein sequences to support proteomics and comparative evolutionary research as well as functional genomics and genome annotation. The results demonstrate the utility of the comparative approach in obtaining predicted protein sequences in other species.
format Text
id pubmed-1314900
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-13149002005-12-15 Characterization of 954 bovine full-CDS cDNA sequences Harhay, Gregory P Sonstegard, Tad S Keele, John W Heaton, Michael P Clawson, Michael L Snelling, Warren M Wiedmann, Ralph T Van Tassell, Curt P Smith, Timothy PL BMC Genomics Methodology Article BACKGROUND: Genome assemblies rely on the existence of transcript sequence to stitch together contigs, verify assembly of whole genome shotgun reads, and annotate genes. Functional genomics studies also rely on transcript sequence to create expression microarrays or interpret digital tag data produced by methods such as Serial Analysis of Gene Expression (SAGE). Transcript sequence can be predicted based on reconstruction from overlapping expressed sequence tags (EST) that are obtained by single-pass sequencing of random cDNA clones, but these reconstructions are prone to errors caused by alternative splice forms, transcripts from gene families with related sequences, and expressed pseudogenes. These errors confound genome assembly and annotation. The most useful transcript sequences are derived by complete insert sequencing of clones containing the entire length, or at least the full protein coding sequence (CDS) portion, of the source mRNA. While the bovine genome sequencing initiative is nearing completion, there is currently a paucity of bovine full-CDS mRNA and protein sequence data to support bovine genome assembly and functional genomics studies. Consequently, the production of high-quality bovine full-CDS cDNA sequences will enhance the bovine genome assembly and functional studies of bovine genes and gene products. The goal of this investigation was to identify and characterize the full-CDS sequences of bovine transcripts from clones identified in non-full-length enriched cDNA libraries. In contrast to several recent full-length cDNA investigations, these full-CDS cDNAs were selected, sequenced, and annotated without the benefit of the target organism's genomic sequence, by using comparison of bovine EST sequence to existing human mRNA to identify likely full-CDS clones for full-length insert cDNA (FLIC) sequencing. RESULTS: The predicted bovine protein lengths, 5' UTR lengths, and Kozak consensus sequences from 954 bovine FLIC sequences (bFLICs; average length 1713 nt, representing 762 distinct loci) are all consistent with previously sequenced mammalian full-length transcripts. CONCLUSION: In most cases, the bFLICs span the entire CDS of the genes, providing the basis for creating predicted bovine protein sequences to support proteomics and comparative evolutionary research as well as functional genomics and genome annotation. The results demonstrate the utility of the comparative approach in obtaining predicted protein sequences in other species. BioMed Central 2005-11-23 /pmc/articles/PMC1314900/ /pubmed/16305752 http://dx.doi.org/10.1186/1471-2164-6-166 Text en Copyright © 2005 Harhay et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Harhay, Gregory P
Sonstegard, Tad S
Keele, John W
Heaton, Michael P
Clawson, Michael L
Snelling, Warren M
Wiedmann, Ralph T
Van Tassell, Curt P
Smith, Timothy PL
Characterization of 954 bovine full-CDS cDNA sequences
title Characterization of 954 bovine full-CDS cDNA sequences
title_full Characterization of 954 bovine full-CDS cDNA sequences
title_fullStr Characterization of 954 bovine full-CDS cDNA sequences
title_full_unstemmed Characterization of 954 bovine full-CDS cDNA sequences
title_short Characterization of 954 bovine full-CDS cDNA sequences
title_sort characterization of 954 bovine full-cds cdna sequences
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1314900/
https://www.ncbi.nlm.nih.gov/pubmed/16305752
http://dx.doi.org/10.1186/1471-2164-6-166
work_keys_str_mv AT harhaygregoryp characterizationof954bovinefullcdscdnasequences
AT sonstegardtads characterizationof954bovinefullcdscdnasequences
AT keelejohnw characterizationof954bovinefullcdscdnasequences
AT heatonmichaelp characterizationof954bovinefullcdscdnasequences
AT clawsonmichaell characterizationof954bovinefullcdscdnasequences
AT snellingwarrenm characterizationof954bovinefullcdscdnasequences
AT wiedmannralpht characterizationof954bovinefullcdscdnasequences
AT vantassellcurtp characterizationof954bovinefullcdscdnasequences
AT smithtimothypl characterizationof954bovinefullcdscdnasequences