Cargando…

Sequence biases in large scale gene expression profiling data

We present the results of a simple, statistical assay that measures the G+C content sensitivity bias of gene expression experiments without the requirement of a duplicate experiment. We analyse five gene expression profiling methods: Affymetrix GeneChip, Long Serial Analysis of Gene Expression (Long...

Descripción completa

Detalles Bibliográficos
Autores principales: Siddiqui, Asim S., Delaney, Allen D., Schnerch, Angelique, Griffith, Obi L., Jones, Steven J. M., Marra, Marco A.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1524917/
https://www.ncbi.nlm.nih.gov/pubmed/16840527
http://dx.doi.org/10.1093/nar/gkl404
_version_ 1782128857755680768
author Siddiqui, Asim S.
Delaney, Allen D.
Schnerch, Angelique
Griffith, Obi L.
Jones, Steven J. M.
Marra, Marco A.
author_facet Siddiqui, Asim S.
Delaney, Allen D.
Schnerch, Angelique
Griffith, Obi L.
Jones, Steven J. M.
Marra, Marco A.
author_sort Siddiqui, Asim S.
collection PubMed
description We present the results of a simple, statistical assay that measures the G+C content sensitivity bias of gene expression experiments without the requirement of a duplicate experiment. We analyse five gene expression profiling methods: Affymetrix GeneChip, Long Serial Analysis of Gene Expression (LongSAGE), LongSAGELite, ‘Classic’ Massively Parallel Signature Sequencing (MPSS) and ‘Signature’ MPSS. We demonstrate the methods have systematic and random errors leading to a different G+C content sensitivity. The relationship between this experimental error and the G+C content of the probe set or tag that identifies each gene influences whether the gene is detected and, if detected, the level of gene expression measured. LongSAGE has the least bias, while Signature MPSS shows a strong bias to G+C rich tags and Affymetrix data show different bias depending on the data processing method (MAS 5.0, RMA or GC-RMA). The bias in the Affymetrix data primarily impacts genes expressed at lower levels. Despite the larger sampling of the MPSS library, SAGE identifies significantly more genes (60% more RefSeq genes in a single comparison).
format Text
id pubmed-1524917
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-15249172006-08-09 Sequence biases in large scale gene expression profiling data Siddiqui, Asim S. Delaney, Allen D. Schnerch, Angelique Griffith, Obi L. Jones, Steven J. M. Marra, Marco A. Nucleic Acids Res Methods Online We present the results of a simple, statistical assay that measures the G+C content sensitivity bias of gene expression experiments without the requirement of a duplicate experiment. We analyse five gene expression profiling methods: Affymetrix GeneChip, Long Serial Analysis of Gene Expression (LongSAGE), LongSAGELite, ‘Classic’ Massively Parallel Signature Sequencing (MPSS) and ‘Signature’ MPSS. We demonstrate the methods have systematic and random errors leading to a different G+C content sensitivity. The relationship between this experimental error and the G+C content of the probe set or tag that identifies each gene influences whether the gene is detected and, if detected, the level of gene expression measured. LongSAGE has the least bias, while Signature MPSS shows a strong bias to G+C rich tags and Affymetrix data show different bias depending on the data processing method (MAS 5.0, RMA or GC-RMA). The bias in the Affymetrix data primarily impacts genes expressed at lower levels. Despite the larger sampling of the MPSS library, SAGE identifies significantly more genes (60% more RefSeq genes in a single comparison). Oxford University Press 2006 2006-07-13 /pmc/articles/PMC1524917/ /pubmed/16840527 http://dx.doi.org/10.1093/nar/gkl404 Text en © 2006 The Author(s)
spellingShingle Methods Online
Siddiqui, Asim S.
Delaney, Allen D.
Schnerch, Angelique
Griffith, Obi L.
Jones, Steven J. M.
Marra, Marco A.
Sequence biases in large scale gene expression profiling data
title Sequence biases in large scale gene expression profiling data
title_full Sequence biases in large scale gene expression profiling data
title_fullStr Sequence biases in large scale gene expression profiling data
title_full_unstemmed Sequence biases in large scale gene expression profiling data
title_short Sequence biases in large scale gene expression profiling data
title_sort sequence biases in large scale gene expression profiling data
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1524917/
https://www.ncbi.nlm.nih.gov/pubmed/16840527
http://dx.doi.org/10.1093/nar/gkl404
work_keys_str_mv AT siddiquiasims sequencebiasesinlargescalegeneexpressionprofilingdata
AT delaneyallend sequencebiasesinlargescalegeneexpressionprofilingdata
AT schnerchangelique sequencebiasesinlargescalegeneexpressionprofilingdata
AT griffithobil sequencebiasesinlargescalegeneexpressionprofilingdata
AT jonesstevenjm sequencebiasesinlargescalegeneexpressionprofilingdata
AT marramarcoa sequencebiasesinlargescalegeneexpressionprofilingdata