Cargando…

The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis

BACKGROUND: The number of gene expression studies in the public domain is rapidly increasing, representing a highly valuable resource. However, dataset-specific bias precludes meta-analysis at the raw transcript level, even when the RNA is from comparable sources and has been processed on the same m...

Descripción completa

Detalles Bibliográficos
Autores principales: Sims, Andrew H, Smethurst, Graeme J, Hey, Yvonne, Okoniewski, Michal J, Pepper, Stuart D, Howell, Anthony, Miller, Crispin J, Clarke, Robert B
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2563019/
https://www.ncbi.nlm.nih.gov/pubmed/18803878
http://dx.doi.org/10.1186/1755-8794-1-42
_version_ 1782159786175889408
author Sims, Andrew H
Smethurst, Graeme J
Hey, Yvonne
Okoniewski, Michal J
Pepper, Stuart D
Howell, Anthony
Miller, Crispin J
Clarke, Robert B
author_facet Sims, Andrew H
Smethurst, Graeme J
Hey, Yvonne
Okoniewski, Michal J
Pepper, Stuart D
Howell, Anthony
Miller, Crispin J
Clarke, Robert B
author_sort Sims, Andrew H
collection PubMed
description BACKGROUND: The number of gene expression studies in the public domain is rapidly increasing, representing a highly valuable resource. However, dataset-specific bias precludes meta-analysis at the raw transcript level, even when the RNA is from comparable sources and has been processed on the same microarray platform using similar protocols. Here, we demonstrate, using Affymetrix data, that much of this bias can be removed, allowing multiple datasets to be legitimately combined for meaningful meta-analyses. RESULTS: A series of validation datasets comparing breast cancer and normal breast cell lines (MCF7 and MCF10A) were generated to examine the variability between datasets generated using different amounts of starting RNA, alternative protocols, different generations of Affymetrix GeneChip or scanning hardware. We demonstrate that systematic, multiplicative biases are introduced at the RNA, hybridization and image-capture stages of a microarray experiment. Simple batch mean-centering was found to significantly reduce the level of inter-experimental variation, allowing raw transcript levels to be compared across datasets with confidence. By accounting for dataset-specific bias, we were able to assemble the largest gene expression dataset of primary breast tumours to-date (1107), from six previously published studies. Using this meta-dataset, we demonstrate that combining greater numbers of datasets or tumours leads to a greater overlap in differentially expressed genes and more accurate prognostic predictions. However, this is highly dependent upon the composition of the datasets and patient characteristics. CONCLUSION: Multiplicative, systematic biases are introduced at many stages of microarray experiments. When these are reconciled, raw data can be directly integrated from different gene expression datasets leading to new biological findings with increased statistical power.
format Text
id pubmed-2563019
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25630192008-10-08 The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis Sims, Andrew H Smethurst, Graeme J Hey, Yvonne Okoniewski, Michal J Pepper, Stuart D Howell, Anthony Miller, Crispin J Clarke, Robert B BMC Med Genomics Research Article BACKGROUND: The number of gene expression studies in the public domain is rapidly increasing, representing a highly valuable resource. However, dataset-specific bias precludes meta-analysis at the raw transcript level, even when the RNA is from comparable sources and has been processed on the same microarray platform using similar protocols. Here, we demonstrate, using Affymetrix data, that much of this bias can be removed, allowing multiple datasets to be legitimately combined for meaningful meta-analyses. RESULTS: A series of validation datasets comparing breast cancer and normal breast cell lines (MCF7 and MCF10A) were generated to examine the variability between datasets generated using different amounts of starting RNA, alternative protocols, different generations of Affymetrix GeneChip or scanning hardware. We demonstrate that systematic, multiplicative biases are introduced at the RNA, hybridization and image-capture stages of a microarray experiment. Simple batch mean-centering was found to significantly reduce the level of inter-experimental variation, allowing raw transcript levels to be compared across datasets with confidence. By accounting for dataset-specific bias, we were able to assemble the largest gene expression dataset of primary breast tumours to-date (1107), from six previously published studies. Using this meta-dataset, we demonstrate that combining greater numbers of datasets or tumours leads to a greater overlap in differentially expressed genes and more accurate prognostic predictions. However, this is highly dependent upon the composition of the datasets and patient characteristics. CONCLUSION: Multiplicative, systematic biases are introduced at many stages of microarray experiments. When these are reconciled, raw data can be directly integrated from different gene expression datasets leading to new biological findings with increased statistical power. BioMed Central 2008-09-21 /pmc/articles/PMC2563019/ /pubmed/18803878 http://dx.doi.org/10.1186/1755-8794-1-42 Text en Copyright © 2008 Sims et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Sims, Andrew H
Smethurst, Graeme J
Hey, Yvonne
Okoniewski, Michal J
Pepper, Stuart D
Howell, Anthony
Miller, Crispin J
Clarke, Robert B
The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis
title The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis
title_full The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis
title_fullStr The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis
title_full_unstemmed The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis
title_short The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis
title_sort removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2563019/
https://www.ncbi.nlm.nih.gov/pubmed/18803878
http://dx.doi.org/10.1186/1755-8794-1-42
work_keys_str_mv AT simsandrewh theremovalofmultiplicativesystematicbiasallowsintegrationofbreastcancergeneexpressiondatasetsimprovingmetaanalysisandpredictionofprognosis
AT smethurstgraemej theremovalofmultiplicativesystematicbiasallowsintegrationofbreastcancergeneexpressiondatasetsimprovingmetaanalysisandpredictionofprognosis
AT heyyvonne theremovalofmultiplicativesystematicbiasallowsintegrationofbreastcancergeneexpressiondatasetsimprovingmetaanalysisandpredictionofprognosis
AT okoniewskimichalj theremovalofmultiplicativesystematicbiasallowsintegrationofbreastcancergeneexpressiondatasetsimprovingmetaanalysisandpredictionofprognosis
AT pepperstuartd theremovalofmultiplicativesystematicbiasallowsintegrationofbreastcancergeneexpressiondatasetsimprovingmetaanalysisandpredictionofprognosis
AT howellanthony theremovalofmultiplicativesystematicbiasallowsintegrationofbreastcancergeneexpressiondatasetsimprovingmetaanalysisandpredictionofprognosis
AT millercrispinj theremovalofmultiplicativesystematicbiasallowsintegrationofbreastcancergeneexpressiondatasetsimprovingmetaanalysisandpredictionofprognosis
AT clarkerobertb theremovalofmultiplicativesystematicbiasallowsintegrationofbreastcancergeneexpressiondatasetsimprovingmetaanalysisandpredictionofprognosis
AT simsandrewh removalofmultiplicativesystematicbiasallowsintegrationofbreastcancergeneexpressiondatasetsimprovingmetaanalysisandpredictionofprognosis
AT smethurstgraemej removalofmultiplicativesystematicbiasallowsintegrationofbreastcancergeneexpressiondatasetsimprovingmetaanalysisandpredictionofprognosis
AT heyyvonne removalofmultiplicativesystematicbiasallowsintegrationofbreastcancergeneexpressiondatasetsimprovingmetaanalysisandpredictionofprognosis
AT okoniewskimichalj removalofmultiplicativesystematicbiasallowsintegrationofbreastcancergeneexpressiondatasetsimprovingmetaanalysisandpredictionofprognosis
AT pepperstuartd removalofmultiplicativesystematicbiasallowsintegrationofbreastcancergeneexpressiondatasetsimprovingmetaanalysisandpredictionofprognosis
AT howellanthony removalofmultiplicativesystematicbiasallowsintegrationofbreastcancergeneexpressiondatasetsimprovingmetaanalysisandpredictionofprognosis
AT millercrispinj removalofmultiplicativesystematicbiasallowsintegrationofbreastcancergeneexpressiondatasetsimprovingmetaanalysisandpredictionofprognosis
AT clarkerobertb removalofmultiplicativesystematicbiasallowsintegrationofbreastcancergeneexpressiondatasetsimprovingmetaanalysisandpredictionofprognosis