Cargando…

Discarding duplicate ditags in LongSAGE analysis may introduce significant error

BACKGROUND: During gene expression analysis by Serial Analysis of Gene Expression (SAGE), duplicate ditags are routinely removed from the data analysis, because they are suspected to stem from artifacts during SAGE library construction. As a consequence, naturally occurring duplicate ditags are also...

Descripción completa

Detalles Bibliográficos
Autores principales: Emmersen, Jeppe, Heidenblut, Anna M, Høgh, Annabeth Laursen, Hahn, Stephan A, Welinder, Karen G, Nielsen, Kåre L
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1839111/
https://www.ncbi.nlm.nih.gov/pubmed/17359537
http://dx.doi.org/10.1186/1471-2105-8-92
_version_ 1782132854866575360
author Emmersen, Jeppe
Heidenblut, Anna M
Høgh, Annabeth Laursen
Hahn, Stephan A
Welinder, Karen G
Nielsen, Kåre L
author_facet Emmersen, Jeppe
Heidenblut, Anna M
Høgh, Annabeth Laursen
Hahn, Stephan A
Welinder, Karen G
Nielsen, Kåre L
author_sort Emmersen, Jeppe
collection PubMed
description BACKGROUND: During gene expression analysis by Serial Analysis of Gene Expression (SAGE), duplicate ditags are routinely removed from the data analysis, because they are suspected to stem from artifacts during SAGE library construction. As a consequence, naturally occurring duplicate ditags are also removed from the analysis leading to an error of measurement. RESULTS: An algorithm was developed to analyze the differential occurrence of SAGE tags in different ditag combinations. Analysis of a pancreatic acinar cell LongSAGE library showed no sign of a general amplification bias that justified the removal of all duplicate ditags. Extending the analysis to 10 additional LongSAGE libraries showed no justification for removal of all duplicate ditags either. On the contrary, while the error introduced in original SAGE by removal of naturally occurring duplicate ditags is insignificant, it leads to an error of up to 3 fold in LongSAGE. However, the algorithm developed for the analysis of duplicate ditags was able to identify individual artifact ditags that originated from rare nucleotide variations of tags and vector contamination. CONCLUSION: The removal of all duplicate ditags was unfounded for the datasets analyzed and led to large errors. This may also be the case for other LongSAGE datasets already present in databases. Analysis of the ditag population, however, can identify artifact tags that should be removed from analysis or have their tag count adjusted.
format Text
id pubmed-1839111
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18391112007-04-04 Discarding duplicate ditags in LongSAGE analysis may introduce significant error Emmersen, Jeppe Heidenblut, Anna M Høgh, Annabeth Laursen Hahn, Stephan A Welinder, Karen G Nielsen, Kåre L BMC Bioinformatics Research Article BACKGROUND: During gene expression analysis by Serial Analysis of Gene Expression (SAGE), duplicate ditags are routinely removed from the data analysis, because they are suspected to stem from artifacts during SAGE library construction. As a consequence, naturally occurring duplicate ditags are also removed from the analysis leading to an error of measurement. RESULTS: An algorithm was developed to analyze the differential occurrence of SAGE tags in different ditag combinations. Analysis of a pancreatic acinar cell LongSAGE library showed no sign of a general amplification bias that justified the removal of all duplicate ditags. Extending the analysis to 10 additional LongSAGE libraries showed no justification for removal of all duplicate ditags either. On the contrary, while the error introduced in original SAGE by removal of naturally occurring duplicate ditags is insignificant, it leads to an error of up to 3 fold in LongSAGE. However, the algorithm developed for the analysis of duplicate ditags was able to identify individual artifact ditags that originated from rare nucleotide variations of tags and vector contamination. CONCLUSION: The removal of all duplicate ditags was unfounded for the datasets analyzed and led to large errors. This may also be the case for other LongSAGE datasets already present in databases. Analysis of the ditag population, however, can identify artifact tags that should be removed from analysis or have their tag count adjusted. BioMed Central 2007-03-14 /pmc/articles/PMC1839111/ /pubmed/17359537 http://dx.doi.org/10.1186/1471-2105-8-92 Text en Copyright © 2007 Emmersen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Emmersen, Jeppe
Heidenblut, Anna M
Høgh, Annabeth Laursen
Hahn, Stephan A
Welinder, Karen G
Nielsen, Kåre L
Discarding duplicate ditags in LongSAGE analysis may introduce significant error
title Discarding duplicate ditags in LongSAGE analysis may introduce significant error
title_full Discarding duplicate ditags in LongSAGE analysis may introduce significant error
title_fullStr Discarding duplicate ditags in LongSAGE analysis may introduce significant error
title_full_unstemmed Discarding duplicate ditags in LongSAGE analysis may introduce significant error
title_short Discarding duplicate ditags in LongSAGE analysis may introduce significant error
title_sort discarding duplicate ditags in longsage analysis may introduce significant error
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1839111/
https://www.ncbi.nlm.nih.gov/pubmed/17359537
http://dx.doi.org/10.1186/1471-2105-8-92
work_keys_str_mv AT emmersenjeppe discardingduplicateditagsinlongsageanalysismayintroducesignificanterror
AT heidenblutannam discardingduplicateditagsinlongsageanalysismayintroducesignificanterror
AT høghannabethlaursen discardingduplicateditagsinlongsageanalysismayintroducesignificanterror
AT hahnstephana discardingduplicateditagsinlongsageanalysismayintroducesignificanterror
AT welinderkareng discardingduplicateditagsinlongsageanalysismayintroducesignificanterror
AT nielsenkarel discardingduplicateditagsinlongsageanalysismayintroducesignificanterror