Cargando…
Unexpected observations after mapping LongSAGE tags to the human genome
BACKGROUND: SAGE has been used widely to study the expression of known transcripts, but much less to annotate new transcribed regions. LongSAGE produces tags that are sufficiently long to be reliably mapped to a whole-genome sequence. Here we used this property to study the position of human LongSAG...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1884178/ https://www.ncbi.nlm.nih.gov/pubmed/17504516 http://dx.doi.org/10.1186/1471-2105-8-154 |
_version_ | 1782133606365265920 |
---|---|
author | Keime, Céline Sémon, Marie Mouchiroud, Dominique Duret, Laurent Gandrillon, Olivier |
author_facet | Keime, Céline Sémon, Marie Mouchiroud, Dominique Duret, Laurent Gandrillon, Olivier |
author_sort | Keime, Céline |
collection | PubMed |
description | BACKGROUND: SAGE has been used widely to study the expression of known transcripts, but much less to annotate new transcribed regions. LongSAGE produces tags that are sufficiently long to be reliably mapped to a whole-genome sequence. Here we used this property to study the position of human LongSAGE tags obtained from all public libraries. We focused mainly on tags that do not map to known transcripts. RESULTS: Using a published error rate in SAGE libraries, we first removed the tags likely to result from sequencing errors. We then observed that an unexpectedly large number of the remaining tags still did not match the genome sequence. Some of these correspond to parts of human mRNAs, such as polyA tails, junctions between two exons and polymorphic regions of transcripts. Another non-negligible proportion can be attributed to contamination by murine transcripts and to residual sequencing errors. After filtering out our data with these screens to ensure that our dataset is highly reliable, we studied the tags that map once to the genome. 31% of these tags correspond to unannotated transcripts. The others map to known transcribed regions, but many of them (nearly half) are located either in antisense or in new variants of these known transcripts. CONCLUSION: We performed a comprehensive study of all publicly available human LongSAGE tags, and carefully verified the reliability of these data. We found the potential origin of many tags that did not match the human genome sequence. The properties of the remaining tags imply that the level of sequencing error may have been under-estimated. The frequency of tags matching once the genome sequence but not in an annotated exon suggests that the human transcriptome is much more complex than shown by the current human genome annotations, with many new splicing variants and antisense transcripts. SAGE data is appropriate to map new transcripts to the genome, as demonstrated by the high rate of cross-validation of the corresponding tags using other methods. |
format | Text |
id | pubmed-1884178 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-18841782007-05-30 Unexpected observations after mapping LongSAGE tags to the human genome Keime, Céline Sémon, Marie Mouchiroud, Dominique Duret, Laurent Gandrillon, Olivier BMC Bioinformatics Research Article BACKGROUND: SAGE has been used widely to study the expression of known transcripts, but much less to annotate new transcribed regions. LongSAGE produces tags that are sufficiently long to be reliably mapped to a whole-genome sequence. Here we used this property to study the position of human LongSAGE tags obtained from all public libraries. We focused mainly on tags that do not map to known transcripts. RESULTS: Using a published error rate in SAGE libraries, we first removed the tags likely to result from sequencing errors. We then observed that an unexpectedly large number of the remaining tags still did not match the genome sequence. Some of these correspond to parts of human mRNAs, such as polyA tails, junctions between two exons and polymorphic regions of transcripts. Another non-negligible proportion can be attributed to contamination by murine transcripts and to residual sequencing errors. After filtering out our data with these screens to ensure that our dataset is highly reliable, we studied the tags that map once to the genome. 31% of these tags correspond to unannotated transcripts. The others map to known transcribed regions, but many of them (nearly half) are located either in antisense or in new variants of these known transcripts. CONCLUSION: We performed a comprehensive study of all publicly available human LongSAGE tags, and carefully verified the reliability of these data. We found the potential origin of many tags that did not match the human genome sequence. The properties of the remaining tags imply that the level of sequencing error may have been under-estimated. The frequency of tags matching once the genome sequence but not in an annotated exon suggests that the human transcriptome is much more complex than shown by the current human genome annotations, with many new splicing variants and antisense transcripts. SAGE data is appropriate to map new transcripts to the genome, as demonstrated by the high rate of cross-validation of the corresponding tags using other methods. BioMed Central 2007-05-15 /pmc/articles/PMC1884178/ /pubmed/17504516 http://dx.doi.org/10.1186/1471-2105-8-154 Text en Copyright © 2007 Keime et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Keime, Céline Sémon, Marie Mouchiroud, Dominique Duret, Laurent Gandrillon, Olivier Unexpected observations after mapping LongSAGE tags to the human genome |
title | Unexpected observations after mapping LongSAGE tags to the human genome |
title_full | Unexpected observations after mapping LongSAGE tags to the human genome |
title_fullStr | Unexpected observations after mapping LongSAGE tags to the human genome |
title_full_unstemmed | Unexpected observations after mapping LongSAGE tags to the human genome |
title_short | Unexpected observations after mapping LongSAGE tags to the human genome |
title_sort | unexpected observations after mapping longsage tags to the human genome |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1884178/ https://www.ncbi.nlm.nih.gov/pubmed/17504516 http://dx.doi.org/10.1186/1471-2105-8-154 |
work_keys_str_mv | AT keimeceline unexpectedobservationsaftermappinglongsagetagstothehumangenome AT semonmarie unexpectedobservationsaftermappinglongsagetagstothehumangenome AT mouchirouddominique unexpectedobservationsaftermappinglongsagetagstothehumangenome AT duretlaurent unexpectedobservationsaftermappinglongsagetagstothehumangenome AT gandrillonolivier unexpectedobservationsaftermappinglongsagetagstothehumangenome |