Cargando…

Statistical issues in the analysis of Illumina data

BACKGROUND: Illumina bead-based arrays are becoming increasingly popular due to their high degree of replication and reported high data quality. However, little attention has been paid to the pre-processing of Illumina data. In this paper, we present our experience of analysing the raw data from an...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dunning, Mark J, Barbosa-Morais, Nuno L, Lynch, Andy G, Tavaré, Simon, Ritchie, Matthew E
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2291044/ https://www.ncbi.nlm.nih.gov/pubmed/18254947 http://dx.doi.org/10.1186/1471-2105-9-85

_version_	1782152424789639168
author	Dunning, Mark J Barbosa-Morais, Nuno L Lynch, Andy G Tavaré, Simon Ritchie, Matthew E
author_facet	Dunning, Mark J Barbosa-Morais, Nuno L Lynch, Andy G Tavaré, Simon Ritchie, Matthew E
author_sort	Dunning, Mark J
collection	PubMed
description	BACKGROUND: Illumina bead-based arrays are becoming increasingly popular due to their high degree of replication and reported high data quality. However, little attention has been paid to the pre-processing of Illumina data. In this paper, we present our experience of analysing the raw data from an Illumina spike-in experiment and offer guidelines for those wishing to analyse expression data or develop new methodologies for this technology. RESULTS: We find that the local background estimated by Illumina is consistently low, and subtracting this background is beneficial for detecting differential expression (DE). Illumina's summary method performs well at removing outliers, producing estimates which are less biased and are less variable than other robust summary methods. However, quality assessment on summarised data may miss spatial artefacts present in the raw data. Also, we find that the background normalisation method used in Illumina's proprietary software (BeadStudio) can cause problems with a standard DE analysis. We demonstrate that variances calculated from the raw data can be used as inverse weights in the DE analysis to improve power. Finally, variability in both expression levels and DE statistics can be attributed to differences in probe composition. These differences are not accounted for by current analysis methods and require further investigation. CONCLUSION: Analysing Illumina expression data using BeadStudio is reasonable because of the conservative estimates of summary values produced by the software. Improvements can however be made by not using background normalisation. Access to the raw data allows for a more detailed quality assessment and flexible analyses. In the case of a gene expression study, data can be analysed on an appropriate scale using established tools. Similar improvements can be expected for other Illumina assays.
format	Text
id	pubmed-2291044
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-22910442008-04-10 Statistical issues in the analysis of Illumina data Dunning, Mark J Barbosa-Morais, Nuno L Lynch, Andy G Tavaré, Simon Ritchie, Matthew E BMC Bioinformatics Research Article BACKGROUND: Illumina bead-based arrays are becoming increasingly popular due to their high degree of replication and reported high data quality. However, little attention has been paid to the pre-processing of Illumina data. In this paper, we present our experience of analysing the raw data from an Illumina spike-in experiment and offer guidelines for those wishing to analyse expression data or develop new methodologies for this technology. RESULTS: We find that the local background estimated by Illumina is consistently low, and subtracting this background is beneficial for detecting differential expression (DE). Illumina's summary method performs well at removing outliers, producing estimates which are less biased and are less variable than other robust summary methods. However, quality assessment on summarised data may miss spatial artefacts present in the raw data. Also, we find that the background normalisation method used in Illumina's proprietary software (BeadStudio) can cause problems with a standard DE analysis. We demonstrate that variances calculated from the raw data can be used as inverse weights in the DE analysis to improve power. Finally, variability in both expression levels and DE statistics can be attributed to differences in probe composition. These differences are not accounted for by current analysis methods and require further investigation. CONCLUSION: Analysing Illumina expression data using BeadStudio is reasonable because of the conservative estimates of summary values produced by the software. Improvements can however be made by not using background normalisation. Access to the raw data allows for a more detailed quality assessment and flexible analyses. In the case of a gene expression study, data can be analysed on an appropriate scale using established tools. Similar improvements can be expected for other Illumina assays. BioMed Central 2008-02-06 /pmc/articles/PMC2291044/ /pubmed/18254947 http://dx.doi.org/10.1186/1471-2105-9-85 Text en Copyright © 2008 Dunning et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Dunning, Mark J Barbosa-Morais, Nuno L Lynch, Andy G Tavaré, Simon Ritchie, Matthew E Statistical issues in the analysis of Illumina data
title	Statistical issues in the analysis of Illumina data
title_full	Statistical issues in the analysis of Illumina data
title_fullStr	Statistical issues in the analysis of Illumina data
title_full_unstemmed	Statistical issues in the analysis of Illumina data
title_short	Statistical issues in the analysis of Illumina data
title_sort	statistical issues in the analysis of illumina data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2291044/ https://www.ncbi.nlm.nih.gov/pubmed/18254947 http://dx.doi.org/10.1186/1471-2105-9-85
work_keys_str_mv	AT dunningmarkj statisticalissuesintheanalysisofilluminadata AT barbosamoraisnunol statisticalissuesintheanalysisofilluminadata AT lynchandyg statisticalissuesintheanalysisofilluminadata AT tavaresimon statisticalissuesintheanalysisofilluminadata AT ritchiematthewe statisticalissuesintheanalysisofilluminadata

Statistical issues in the analysis of Illumina data

Ejemplares similares