Cargando…
EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome
Clustering and assembly of expressed sequence tags (ESTs) constitute the basis for most genomewide descriptions of a transcriptome. This approach is limited by the decline in sequence quality toward the end of each EST, impacting both sequence clustering and assembly. Here, we exploit the available...
Autores principales: | , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1874618/ https://www.ncbi.nlm.nih.gov/pubmed/17355987 http://dx.doi.org/10.1093/nar/gkm081 |
_version_ | 1782133486517223424 |
---|---|
author | Jain, Monica Shrager, Jeff Harris, Elizabeth H. Halbrook, Renee Grossman, Arthur R. Hauser, Charles Vallon, Olivier |
author_facet | Jain, Monica Shrager, Jeff Harris, Elizabeth H. Halbrook, Renee Grossman, Arthur R. Hauser, Charles Vallon, Olivier |
author_sort | Jain, Monica |
collection | PubMed |
description | Clustering and assembly of expressed sequence tags (ESTs) constitute the basis for most genomewide descriptions of a transcriptome. This approach is limited by the decline in sequence quality toward the end of each EST, impacting both sequence clustering and assembly. Here, we exploit the available draft genome sequence of the unicellular green alga Chlamydomonas reinhardtii to guide clustering and to correct errors in the ESTs. We have grouped all available EST and cDNA sequences into 12 063 ACEGs (assembly of contiguous ESTs based on genome) and generated 15 857 contigs of average length 934 nt. We predict that roughly 3000 of our contigs represent full-length transcripts. Compared to previous assemblies, ACEGs show extended contig length, increased accuracy and a reduction in redundancy. Because our assembly protocol also uses ESTs with no corresponding genomic sequences, it provides sequence information for genes interrupted by sequence gaps. Detailed analysis of randomly sampled ACEGs reveals several hundred putative cases of alternative splicing, many overlapping transcription units and new genes not identified by gene prediction algorithms. Our protocol, although developed for and tailored to the C. reinhardtii dataset, can be exploited by any eukaryotic genome project for which both a draft genome sequence and ESTs are available. |
format | Text |
id | pubmed-1874618 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-18746182007-05-23 EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome Jain, Monica Shrager, Jeff Harris, Elizabeth H. Halbrook, Renee Grossman, Arthur R. Hauser, Charles Vallon, Olivier Nucleic Acids Res Genomics Clustering and assembly of expressed sequence tags (ESTs) constitute the basis for most genomewide descriptions of a transcriptome. This approach is limited by the decline in sequence quality toward the end of each EST, impacting both sequence clustering and assembly. Here, we exploit the available draft genome sequence of the unicellular green alga Chlamydomonas reinhardtii to guide clustering and to correct errors in the ESTs. We have grouped all available EST and cDNA sequences into 12 063 ACEGs (assembly of contiguous ESTs based on genome) and generated 15 857 contigs of average length 934 nt. We predict that roughly 3000 of our contigs represent full-length transcripts. Compared to previous assemblies, ACEGs show extended contig length, increased accuracy and a reduction in redundancy. Because our assembly protocol also uses ESTs with no corresponding genomic sequences, it provides sequence information for genes interrupted by sequence gaps. Detailed analysis of randomly sampled ACEGs reveals several hundred putative cases of alternative splicing, many overlapping transcription units and new genes not identified by gene prediction algorithms. Our protocol, although developed for and tailored to the C. reinhardtii dataset, can be exploited by any eukaryotic genome project for which both a draft genome sequence and ESTs are available. Oxford University Press 2007-03 2007-03-13 /pmc/articles/PMC1874618/ /pubmed/17355987 http://dx.doi.org/10.1093/nar/gkm081 Text en © 2007 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Genomics Jain, Monica Shrager, Jeff Harris, Elizabeth H. Halbrook, Renee Grossman, Arthur R. Hauser, Charles Vallon, Olivier EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome |
title | EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome |
title_full | EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome |
title_fullStr | EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome |
title_full_unstemmed | EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome |
title_short | EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome |
title_sort | est assembly supported by a draft genome sequence: an analysis of the chlamydomonas reinhardtii transcriptome |
topic | Genomics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1874618/ https://www.ncbi.nlm.nih.gov/pubmed/17355987 http://dx.doi.org/10.1093/nar/gkm081 |
work_keys_str_mv | AT jainmonica estassemblysupportedbyadraftgenomesequenceananalysisofthechlamydomonasreinhardtiitranscriptome AT shragerjeff estassemblysupportedbyadraftgenomesequenceananalysisofthechlamydomonasreinhardtiitranscriptome AT harriselizabethh estassemblysupportedbyadraftgenomesequenceananalysisofthechlamydomonasreinhardtiitranscriptome AT halbrookrenee estassemblysupportedbyadraftgenomesequenceananalysisofthechlamydomonasreinhardtiitranscriptome AT grossmanarthurr estassemblysupportedbyadraftgenomesequenceananalysisofthechlamydomonasreinhardtiitranscriptome AT hausercharles estassemblysupportedbyadraftgenomesequenceananalysisofthechlamydomonasreinhardtiitranscriptome AT vallonolivier estassemblysupportedbyadraftgenomesequenceananalysisofthechlamydomonasreinhardtiitranscriptome |