Cargando…

A comparative analysis of the information content in long and short SAGE libraries

BACKGROUND: Serial Analysis of Gene Expression (SAGE) is a powerful tool to determine gene expression profiles. Two types of SAGE libraries, ShortSAGE and LongSAGE, are classified based on the length of the SAGE tag (10 vs. 17 basepairs). LongSAGE libraries are thought to be more useful than ShortSA...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yi-Ju, Xu, Puting, Qin, Xuejun, Schmechel, Donald E, Hulette, Christine M, Haines, Jonathan L, Pericak-Vance, Margaret A, Gilbert, John R
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1676023/
https://www.ncbi.nlm.nih.gov/pubmed/17109755
http://dx.doi.org/10.1186/1471-2105-7-504
_version_ 1782131138411626496
author Li, Yi-Ju
Xu, Puting
Qin, Xuejun
Schmechel, Donald E
Hulette, Christine M
Haines, Jonathan L
Pericak-Vance, Margaret A
Gilbert, John R
author_facet Li, Yi-Ju
Xu, Puting
Qin, Xuejun
Schmechel, Donald E
Hulette, Christine M
Haines, Jonathan L
Pericak-Vance, Margaret A
Gilbert, John R
author_sort Li, Yi-Ju
collection PubMed
description BACKGROUND: Serial Analysis of Gene Expression (SAGE) is a powerful tool to determine gene expression profiles. Two types of SAGE libraries, ShortSAGE and LongSAGE, are classified based on the length of the SAGE tag (10 vs. 17 basepairs). LongSAGE libraries are thought to be more useful than ShortSAGE libraries, but their information content has not been widely compared. To dissect the differences between these two types of libraries, we utilized four libraries (two LongSAGE and two ShortSAGE libraries) generated from the hippocampus of Alzheimer and control samples. In addition, we generated two additional short SAGE libraries, the truncated long SAGE libraries (tSAGE), from LongSAGE libraries by deleting seven 5' basepairs from each LongSAGE tag. RESULTS: One problem that occurred in the SAGE study is that individual tags may have matched to multiple different genes – due to the short length of a tag. We found that the LongSAGE tag maps up to 15 UniGene clusters, while the ShortSAGE and tSAGE tags map up to 279 UniGene clusters. Both long and short SAGE libraries exhibit a large number of orphan tags (no gene information in UniGene), implying the limitation of the UniGene database. Among 100 orphan LongSAGE tags, the complete sequences (17 basepairs) of nine orphan tags match to 17 genomic sequences; four of the orphan tags match to a single genomic sequence. Our data show the potential to resolve 4–9% of orphan LongSAGE tags. Finally, among 400 tSAGE tags showing significant differential expression between AD and control, 79 tags (19.8%) were derived from multiple non-significant LongSAGE tags, implying the false positive results. CONCLUSION: Our data show that LongSAGE tags have high specificity in gene mapping compared to ShortSAGE tags. LongSAGE tags show an advantage over ShortSAGE in identifying novel genes by BLAST analysis. Most importantly, the chances of obtaining false positive results are higher for ShortSAGE than LongSAGE libraries due to their specificity in gene mapping. Therefore, it is recommended that the number of corresponding UniGene clusters (gene or ESTs) of a tag for prioritizing the significant results be considered.
format Text
id pubmed-1676023
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-16760232006-12-05 A comparative analysis of the information content in long and short SAGE libraries Li, Yi-Ju Xu, Puting Qin, Xuejun Schmechel, Donald E Hulette, Christine M Haines, Jonathan L Pericak-Vance, Margaret A Gilbert, John R BMC Bioinformatics Research Article BACKGROUND: Serial Analysis of Gene Expression (SAGE) is a powerful tool to determine gene expression profiles. Two types of SAGE libraries, ShortSAGE and LongSAGE, are classified based on the length of the SAGE tag (10 vs. 17 basepairs). LongSAGE libraries are thought to be more useful than ShortSAGE libraries, but their information content has not been widely compared. To dissect the differences between these two types of libraries, we utilized four libraries (two LongSAGE and two ShortSAGE libraries) generated from the hippocampus of Alzheimer and control samples. In addition, we generated two additional short SAGE libraries, the truncated long SAGE libraries (tSAGE), from LongSAGE libraries by deleting seven 5' basepairs from each LongSAGE tag. RESULTS: One problem that occurred in the SAGE study is that individual tags may have matched to multiple different genes – due to the short length of a tag. We found that the LongSAGE tag maps up to 15 UniGene clusters, while the ShortSAGE and tSAGE tags map up to 279 UniGene clusters. Both long and short SAGE libraries exhibit a large number of orphan tags (no gene information in UniGene), implying the limitation of the UniGene database. Among 100 orphan LongSAGE tags, the complete sequences (17 basepairs) of nine orphan tags match to 17 genomic sequences; four of the orphan tags match to a single genomic sequence. Our data show the potential to resolve 4–9% of orphan LongSAGE tags. Finally, among 400 tSAGE tags showing significant differential expression between AD and control, 79 tags (19.8%) were derived from multiple non-significant LongSAGE tags, implying the false positive results. CONCLUSION: Our data show that LongSAGE tags have high specificity in gene mapping compared to ShortSAGE tags. LongSAGE tags show an advantage over ShortSAGE in identifying novel genes by BLAST analysis. Most importantly, the chances of obtaining false positive results are higher for ShortSAGE than LongSAGE libraries due to their specificity in gene mapping. Therefore, it is recommended that the number of corresponding UniGene clusters (gene or ESTs) of a tag for prioritizing the significant results be considered. BioMed Central 2006-11-16 /pmc/articles/PMC1676023/ /pubmed/17109755 http://dx.doi.org/10.1186/1471-2105-7-504 Text en Copyright © 2006 Li et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Li, Yi-Ju
Xu, Puting
Qin, Xuejun
Schmechel, Donald E
Hulette, Christine M
Haines, Jonathan L
Pericak-Vance, Margaret A
Gilbert, John R
A comparative analysis of the information content in long and short SAGE libraries
title A comparative analysis of the information content in long and short SAGE libraries
title_full A comparative analysis of the information content in long and short SAGE libraries
title_fullStr A comparative analysis of the information content in long and short SAGE libraries
title_full_unstemmed A comparative analysis of the information content in long and short SAGE libraries
title_short A comparative analysis of the information content in long and short SAGE libraries
title_sort comparative analysis of the information content in long and short sage libraries
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1676023/
https://www.ncbi.nlm.nih.gov/pubmed/17109755
http://dx.doi.org/10.1186/1471-2105-7-504
work_keys_str_mv AT liyiju acomparativeanalysisoftheinformationcontentinlongandshortsagelibraries
AT xuputing acomparativeanalysisoftheinformationcontentinlongandshortsagelibraries
AT qinxuejun acomparativeanalysisoftheinformationcontentinlongandshortsagelibraries
AT schmecheldonalde acomparativeanalysisoftheinformationcontentinlongandshortsagelibraries
AT hulettechristinem acomparativeanalysisoftheinformationcontentinlongandshortsagelibraries
AT hainesjonathanl acomparativeanalysisoftheinformationcontentinlongandshortsagelibraries
AT pericakvancemargareta acomparativeanalysisoftheinformationcontentinlongandshortsagelibraries
AT gilbertjohnr acomparativeanalysisoftheinformationcontentinlongandshortsagelibraries
AT liyiju comparativeanalysisoftheinformationcontentinlongandshortsagelibraries
AT xuputing comparativeanalysisoftheinformationcontentinlongandshortsagelibraries
AT qinxuejun comparativeanalysisoftheinformationcontentinlongandshortsagelibraries
AT schmecheldonalde comparativeanalysisoftheinformationcontentinlongandshortsagelibraries
AT hulettechristinem comparativeanalysisoftheinformationcontentinlongandshortsagelibraries
AT hainesjonathanl comparativeanalysisoftheinformationcontentinlongandshortsagelibraries
AT pericakvancemargareta comparativeanalysisoftheinformationcontentinlongandshortsagelibraries
AT gilbertjohnr comparativeanalysisoftheinformationcontentinlongandshortsagelibraries