Cargando…

Error, reproducibility and sensitivity: a pipeline for data processing of Agilent oligonucleotide expression arrays

BACKGROUND: Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more soph...

Descripción completa

Detalles Bibliográficos
Autores principales: Chain, Benjamin, Bowen, Helen, Hammond, John, Posch, Wilfried, Rasaiyaah, Jane, Tsang, Jhen, Noursadeghi, Mahdad
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2909218/
https://www.ncbi.nlm.nih.gov/pubmed/20576120
http://dx.doi.org/10.1186/1471-2105-11-344
_version_ 1782184289685733376
author Chain, Benjamin
Bowen, Helen
Hammond, John
Posch, Wilfried
Rasaiyaah, Jane
Tsang, Jhen
Noursadeghi, Mahdad
author_facet Chain, Benjamin
Bowen, Helen
Hammond, John
Posch, Wilfried
Rasaiyaah, Jane
Tsang, Jhen
Noursadeghi, Mahdad
author_sort Chain, Benjamin
collection PubMed
description BACKGROUND: Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples. RESULTS: We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2% of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log(2 )units ( 6% of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators. CONCLUSIONS: This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells.
format Text
id pubmed-2909218
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29092182010-07-24 Error, reproducibility and sensitivity: a pipeline for data processing of Agilent oligonucleotide expression arrays Chain, Benjamin Bowen, Helen Hammond, John Posch, Wilfried Rasaiyaah, Jane Tsang, Jhen Noursadeghi, Mahdad BMC Bioinformatics Research Article BACKGROUND: Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples. RESULTS: We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2% of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log(2 )units ( 6% of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators. CONCLUSIONS: This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells. BioMed Central 2010-06-24 /pmc/articles/PMC2909218/ /pubmed/20576120 http://dx.doi.org/10.1186/1471-2105-11-344 Text en Copyright ©2010 Chain et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Chain, Benjamin
Bowen, Helen
Hammond, John
Posch, Wilfried
Rasaiyaah, Jane
Tsang, Jhen
Noursadeghi, Mahdad
Error, reproducibility and sensitivity: a pipeline for data processing of Agilent oligonucleotide expression arrays
title Error, reproducibility and sensitivity: a pipeline for data processing of Agilent oligonucleotide expression arrays
title_full Error, reproducibility and sensitivity: a pipeline for data processing of Agilent oligonucleotide expression arrays
title_fullStr Error, reproducibility and sensitivity: a pipeline for data processing of Agilent oligonucleotide expression arrays
title_full_unstemmed Error, reproducibility and sensitivity: a pipeline for data processing of Agilent oligonucleotide expression arrays
title_short Error, reproducibility and sensitivity: a pipeline for data processing of Agilent oligonucleotide expression arrays
title_sort error, reproducibility and sensitivity: a pipeline for data processing of agilent oligonucleotide expression arrays
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2909218/
https://www.ncbi.nlm.nih.gov/pubmed/20576120
http://dx.doi.org/10.1186/1471-2105-11-344
work_keys_str_mv AT chainbenjamin errorreproducibilityandsensitivityapipelinefordataprocessingofagilentoligonucleotideexpressionarrays
AT bowenhelen errorreproducibilityandsensitivityapipelinefordataprocessingofagilentoligonucleotideexpressionarrays
AT hammondjohn errorreproducibilityandsensitivityapipelinefordataprocessingofagilentoligonucleotideexpressionarrays
AT poschwilfried errorreproducibilityandsensitivityapipelinefordataprocessingofagilentoligonucleotideexpressionarrays
AT rasaiyaahjane errorreproducibilityandsensitivityapipelinefordataprocessingofagilentoligonucleotideexpressionarrays
AT tsangjhen errorreproducibilityandsensitivityapipelinefordataprocessingofagilentoligonucleotideexpressionarrays
AT noursadeghimahdad errorreproducibilityandsensitivityapipelinefordataprocessingofagilentoligonucleotideexpressionarrays