Cargando…
A comparative analysis of the information content in long and short SAGE libraries
BACKGROUND: Serial Analysis of Gene Expression (SAGE) is a powerful tool to determine gene expression profiles. Two types of SAGE libraries, ShortSAGE and LongSAGE, are classified based on the length of the SAGE tag (10 vs. 17 basepairs). LongSAGE libraries are thought to be more useful than ShortSA...
Autores principales: | , , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1676023/ https://www.ncbi.nlm.nih.gov/pubmed/17109755 http://dx.doi.org/10.1186/1471-2105-7-504 |
_version_ | 1782131138411626496 |
---|---|
author | Li, Yi-Ju Xu, Puting Qin, Xuejun Schmechel, Donald E Hulette, Christine M Haines, Jonathan L Pericak-Vance, Margaret A Gilbert, John R |
author_facet | Li, Yi-Ju Xu, Puting Qin, Xuejun Schmechel, Donald E Hulette, Christine M Haines, Jonathan L Pericak-Vance, Margaret A Gilbert, John R |
author_sort | Li, Yi-Ju |
collection | PubMed |
description | BACKGROUND: Serial Analysis of Gene Expression (SAGE) is a powerful tool to determine gene expression profiles. Two types of SAGE libraries, ShortSAGE and LongSAGE, are classified based on the length of the SAGE tag (10 vs. 17 basepairs). LongSAGE libraries are thought to be more useful than ShortSAGE libraries, but their information content has not been widely compared. To dissect the differences between these two types of libraries, we utilized four libraries (two LongSAGE and two ShortSAGE libraries) generated from the hippocampus of Alzheimer and control samples. In addition, we generated two additional short SAGE libraries, the truncated long SAGE libraries (tSAGE), from LongSAGE libraries by deleting seven 5' basepairs from each LongSAGE tag. RESULTS: One problem that occurred in the SAGE study is that individual tags may have matched to multiple different genes – due to the short length of a tag. We found that the LongSAGE tag maps up to 15 UniGene clusters, while the ShortSAGE and tSAGE tags map up to 279 UniGene clusters. Both long and short SAGE libraries exhibit a large number of orphan tags (no gene information in UniGene), implying the limitation of the UniGene database. Among 100 orphan LongSAGE tags, the complete sequences (17 basepairs) of nine orphan tags match to 17 genomic sequences; four of the orphan tags match to a single genomic sequence. Our data show the potential to resolve 4–9% of orphan LongSAGE tags. Finally, among 400 tSAGE tags showing significant differential expression between AD and control, 79 tags (19.8%) were derived from multiple non-significant LongSAGE tags, implying the false positive results. CONCLUSION: Our data show that LongSAGE tags have high specificity in gene mapping compared to ShortSAGE tags. LongSAGE tags show an advantage over ShortSAGE in identifying novel genes by BLAST analysis. Most importantly, the chances of obtaining false positive results are higher for ShortSAGE than LongSAGE libraries due to their specificity in gene mapping. Therefore, it is recommended that the number of corresponding UniGene clusters (gene or ESTs) of a tag for prioritizing the significant results be considered. |
format | Text |
id | pubmed-1676023 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-16760232006-12-05 A comparative analysis of the information content in long and short SAGE libraries Li, Yi-Ju Xu, Puting Qin, Xuejun Schmechel, Donald E Hulette, Christine M Haines, Jonathan L Pericak-Vance, Margaret A Gilbert, John R BMC Bioinformatics Research Article BACKGROUND: Serial Analysis of Gene Expression (SAGE) is a powerful tool to determine gene expression profiles. Two types of SAGE libraries, ShortSAGE and LongSAGE, are classified based on the length of the SAGE tag (10 vs. 17 basepairs). LongSAGE libraries are thought to be more useful than ShortSAGE libraries, but their information content has not been widely compared. To dissect the differences between these two types of libraries, we utilized four libraries (two LongSAGE and two ShortSAGE libraries) generated from the hippocampus of Alzheimer and control samples. In addition, we generated two additional short SAGE libraries, the truncated long SAGE libraries (tSAGE), from LongSAGE libraries by deleting seven 5' basepairs from each LongSAGE tag. RESULTS: One problem that occurred in the SAGE study is that individual tags may have matched to multiple different genes – due to the short length of a tag. We found that the LongSAGE tag maps up to 15 UniGene clusters, while the ShortSAGE and tSAGE tags map up to 279 UniGene clusters. Both long and short SAGE libraries exhibit a large number of orphan tags (no gene information in UniGene), implying the limitation of the UniGene database. Among 100 orphan LongSAGE tags, the complete sequences (17 basepairs) of nine orphan tags match to 17 genomic sequences; four of the orphan tags match to a single genomic sequence. Our data show the potential to resolve 4–9% of orphan LongSAGE tags. Finally, among 400 tSAGE tags showing significant differential expression between AD and control, 79 tags (19.8%) were derived from multiple non-significant LongSAGE tags, implying the false positive results. CONCLUSION: Our data show that LongSAGE tags have high specificity in gene mapping compared to ShortSAGE tags. LongSAGE tags show an advantage over ShortSAGE in identifying novel genes by BLAST analysis. Most importantly, the chances of obtaining false positive results are higher for ShortSAGE than LongSAGE libraries due to their specificity in gene mapping. Therefore, it is recommended that the number of corresponding UniGene clusters (gene or ESTs) of a tag for prioritizing the significant results be considered. BioMed Central 2006-11-16 /pmc/articles/PMC1676023/ /pubmed/17109755 http://dx.doi.org/10.1186/1471-2105-7-504 Text en Copyright © 2006 Li et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Li, Yi-Ju Xu, Puting Qin, Xuejun Schmechel, Donald E Hulette, Christine M Haines, Jonathan L Pericak-Vance, Margaret A Gilbert, John R A comparative analysis of the information content in long and short SAGE libraries |
title | A comparative analysis of the information content in long and short SAGE libraries |
title_full | A comparative analysis of the information content in long and short SAGE libraries |
title_fullStr | A comparative analysis of the information content in long and short SAGE libraries |
title_full_unstemmed | A comparative analysis of the information content in long and short SAGE libraries |
title_short | A comparative analysis of the information content in long and short SAGE libraries |
title_sort | comparative analysis of the information content in long and short sage libraries |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1676023/ https://www.ncbi.nlm.nih.gov/pubmed/17109755 http://dx.doi.org/10.1186/1471-2105-7-504 |
work_keys_str_mv | AT liyiju acomparativeanalysisoftheinformationcontentinlongandshortsagelibraries AT xuputing acomparativeanalysisoftheinformationcontentinlongandshortsagelibraries AT qinxuejun acomparativeanalysisoftheinformationcontentinlongandshortsagelibraries AT schmecheldonalde acomparativeanalysisoftheinformationcontentinlongandshortsagelibraries AT hulettechristinem acomparativeanalysisoftheinformationcontentinlongandshortsagelibraries AT hainesjonathanl acomparativeanalysisoftheinformationcontentinlongandshortsagelibraries AT pericakvancemargareta acomparativeanalysisoftheinformationcontentinlongandshortsagelibraries AT gilbertjohnr acomparativeanalysisoftheinformationcontentinlongandshortsagelibraries AT liyiju comparativeanalysisoftheinformationcontentinlongandshortsagelibraries AT xuputing comparativeanalysisoftheinformationcontentinlongandshortsagelibraries AT qinxuejun comparativeanalysisoftheinformationcontentinlongandshortsagelibraries AT schmecheldonalde comparativeanalysisoftheinformationcontentinlongandshortsagelibraries AT hulettechristinem comparativeanalysisoftheinformationcontentinlongandshortsagelibraries AT hainesjonathanl comparativeanalysisoftheinformationcontentinlongandshortsagelibraries AT pericakvancemargareta comparativeanalysisoftheinformationcontentinlongandshortsagelibraries AT gilbertjohnr comparativeanalysisoftheinformationcontentinlongandshortsagelibraries |