Cargando…

EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome

Clustering and assembly of expressed sequence tags (ESTs) constitute the basis for most genomewide descriptions of a transcriptome. This approach is limited by the decline in sequence quality toward the end of each EST, impacting both sequence clustering and assembly. Here, we exploit the available...

Descripción completa

Detalles Bibliográficos
Autores principales: Jain, Monica, Shrager, Jeff, Harris, Elizabeth H., Halbrook, Renee, Grossman, Arthur R., Hauser, Charles, Vallon, Olivier
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1874618/
https://www.ncbi.nlm.nih.gov/pubmed/17355987
http://dx.doi.org/10.1093/nar/gkm081
_version_ 1782133486517223424
author Jain, Monica
Shrager, Jeff
Harris, Elizabeth H.
Halbrook, Renee
Grossman, Arthur R.
Hauser, Charles
Vallon, Olivier
author_facet Jain, Monica
Shrager, Jeff
Harris, Elizabeth H.
Halbrook, Renee
Grossman, Arthur R.
Hauser, Charles
Vallon, Olivier
author_sort Jain, Monica
collection PubMed
description Clustering and assembly of expressed sequence tags (ESTs) constitute the basis for most genomewide descriptions of a transcriptome. This approach is limited by the decline in sequence quality toward the end of each EST, impacting both sequence clustering and assembly. Here, we exploit the available draft genome sequence of the unicellular green alga Chlamydomonas reinhardtii to guide clustering and to correct errors in the ESTs. We have grouped all available EST and cDNA sequences into 12 063 ACEGs (assembly of contiguous ESTs based on genome) and generated 15 857 contigs of average length 934 nt. We predict that roughly 3000 of our contigs represent full-length transcripts. Compared to previous assemblies, ACEGs show extended contig length, increased accuracy and a reduction in redundancy. Because our assembly protocol also uses ESTs with no corresponding genomic sequences, it provides sequence information for genes interrupted by sequence gaps. Detailed analysis of randomly sampled ACEGs reveals several hundred putative cases of alternative splicing, many overlapping transcription units and new genes not identified by gene prediction algorithms. Our protocol, although developed for and tailored to the C. reinhardtii dataset, can be exploited by any eukaryotic genome project for which both a draft genome sequence and ESTs are available.
format Text
id pubmed-1874618
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-18746182007-05-23 EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome Jain, Monica Shrager, Jeff Harris, Elizabeth H. Halbrook, Renee Grossman, Arthur R. Hauser, Charles Vallon, Olivier Nucleic Acids Res Genomics Clustering and assembly of expressed sequence tags (ESTs) constitute the basis for most genomewide descriptions of a transcriptome. This approach is limited by the decline in sequence quality toward the end of each EST, impacting both sequence clustering and assembly. Here, we exploit the available draft genome sequence of the unicellular green alga Chlamydomonas reinhardtii to guide clustering and to correct errors in the ESTs. We have grouped all available EST and cDNA sequences into 12 063 ACEGs (assembly of contiguous ESTs based on genome) and generated 15 857 contigs of average length 934 nt. We predict that roughly 3000 of our contigs represent full-length transcripts. Compared to previous assemblies, ACEGs show extended contig length, increased accuracy and a reduction in redundancy. Because our assembly protocol also uses ESTs with no corresponding genomic sequences, it provides sequence information for genes interrupted by sequence gaps. Detailed analysis of randomly sampled ACEGs reveals several hundred putative cases of alternative splicing, many overlapping transcription units and new genes not identified by gene prediction algorithms. Our protocol, although developed for and tailored to the C. reinhardtii dataset, can be exploited by any eukaryotic genome project for which both a draft genome sequence and ESTs are available. Oxford University Press 2007-03 2007-03-13 /pmc/articles/PMC1874618/ /pubmed/17355987 http://dx.doi.org/10.1093/nar/gkm081 Text en © 2007 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Genomics
Jain, Monica
Shrager, Jeff
Harris, Elizabeth H.
Halbrook, Renee
Grossman, Arthur R.
Hauser, Charles
Vallon, Olivier
EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome
title EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome
title_full EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome
title_fullStr EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome
title_full_unstemmed EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome
title_short EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome
title_sort est assembly supported by a draft genome sequence: an analysis of the chlamydomonas reinhardtii transcriptome
topic Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1874618/
https://www.ncbi.nlm.nih.gov/pubmed/17355987
http://dx.doi.org/10.1093/nar/gkm081
work_keys_str_mv AT jainmonica estassemblysupportedbyadraftgenomesequenceananalysisofthechlamydomonasreinhardtiitranscriptome
AT shragerjeff estassemblysupportedbyadraftgenomesequenceananalysisofthechlamydomonasreinhardtiitranscriptome
AT harriselizabethh estassemblysupportedbyadraftgenomesequenceananalysisofthechlamydomonasreinhardtiitranscriptome
AT halbrookrenee estassemblysupportedbyadraftgenomesequenceananalysisofthechlamydomonasreinhardtiitranscriptome
AT grossmanarthurr estassemblysupportedbyadraftgenomesequenceananalysisofthechlamydomonasreinhardtiitranscriptome
AT hausercharles estassemblysupportedbyadraftgenomesequenceananalysisofthechlamydomonasreinhardtiitranscriptome
AT vallonolivier estassemblysupportedbyadraftgenomesequenceananalysisofthechlamydomonasreinhardtiitranscriptome