Cargando…

Intrinsic bias in breast cancer gene expression data sets

BACKGROUND: While global breast cancer gene expression data sets have considerable commonality in terms of their data content, the populations that they represent and the data collection methods utilized can be quite disparate. We sought to assess the extent and consequence of these systematic diffe...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mosley, Jonathan D, Keri, Ruth A
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2711113/ https://www.ncbi.nlm.nih.gov/pubmed/19563679 http://dx.doi.org/10.1186/1471-2407-9-214

_version_	1782169414610714624
author	Mosley, Jonathan D Keri, Ruth A
author_facet	Mosley, Jonathan D Keri, Ruth A
author_sort	Mosley, Jonathan D
collection	PubMed
description	BACKGROUND: While global breast cancer gene expression data sets have considerable commonality in terms of their data content, the populations that they represent and the data collection methods utilized can be quite disparate. We sought to assess the extent and consequence of these systematic differences with respect to identifying clinically significant prognostic groups. METHODS: We ascertained how effectively unsupervised clustering employing randomly generated sets of genes could segregate tumors into prognostic groups using four well-characterized breast cancer data sets. RESULTS: Using a common set of 5,000 randomly generated lists (70 genes/list), the percentages of clusters with significant differences in metastasis latencies (HR p-value < 0.01) was 62%, 15%, 21% and 0% in the NKI2 (Netherlands Cancer Institute), Wang, TRANSBIG and KJX64/KJ125 data sets, respectively. Among ER positive tumors, the percentages were 38%, 11%, 4% and 0%, respectively. Few random lists were predictive among ER negative tumors in any data set. Clustering was associated with ER status and, after globally adjusting for the effects of ER-α gene expression, the percentages were 25%, 33%, 1% and 0%, respectively. The impact of adjusting for ER status depended on the extent of confounding between ER-α gene expression and markers of proliferation. CONCLUSION: It is highly probable to identify a statistically significant association between a given gene list and prognosis in the NKI2 dataset due to its large sample size and the interrelationship between ER-α expression and markers of proliferation. In most respects, the TRANSBIG data set generated similar outcomes as the NKI2 data set, although its smaller sample size led to fewer statistically significant results.
format	Text
id	pubmed-2711113
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-27111132009-07-16 Intrinsic bias in breast cancer gene expression data sets Mosley, Jonathan D Keri, Ruth A BMC Cancer Research Article BACKGROUND: While global breast cancer gene expression data sets have considerable commonality in terms of their data content, the populations that they represent and the data collection methods utilized can be quite disparate. We sought to assess the extent and consequence of these systematic differences with respect to identifying clinically significant prognostic groups. METHODS: We ascertained how effectively unsupervised clustering employing randomly generated sets of genes could segregate tumors into prognostic groups using four well-characterized breast cancer data sets. RESULTS: Using a common set of 5,000 randomly generated lists (70 genes/list), the percentages of clusters with significant differences in metastasis latencies (HR p-value < 0.01) was 62%, 15%, 21% and 0% in the NKI2 (Netherlands Cancer Institute), Wang, TRANSBIG and KJX64/KJ125 data sets, respectively. Among ER positive tumors, the percentages were 38%, 11%, 4% and 0%, respectively. Few random lists were predictive among ER negative tumors in any data set. Clustering was associated with ER status and, after globally adjusting for the effects of ER-α gene expression, the percentages were 25%, 33%, 1% and 0%, respectively. The impact of adjusting for ER status depended on the extent of confounding between ER-α gene expression and markers of proliferation. CONCLUSION: It is highly probable to identify a statistically significant association between a given gene list and prognosis in the NKI2 dataset due to its large sample size and the interrelationship between ER-α expression and markers of proliferation. In most respects, the TRANSBIG data set generated similar outcomes as the NKI2 data set, although its smaller sample size led to fewer statistically significant results. BioMed Central 2009-06-29 /pmc/articles/PMC2711113/ /pubmed/19563679 http://dx.doi.org/10.1186/1471-2407-9-214 Text en Copyright ©2009 Mosley and Keri; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Mosley, Jonathan D Keri, Ruth A Intrinsic bias in breast cancer gene expression data sets
title	Intrinsic bias in breast cancer gene expression data sets
title_full	Intrinsic bias in breast cancer gene expression data sets
title_fullStr	Intrinsic bias in breast cancer gene expression data sets
title_full_unstemmed	Intrinsic bias in breast cancer gene expression data sets
title_short	Intrinsic bias in breast cancer gene expression data sets
title_sort	intrinsic bias in breast cancer gene expression data sets
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2711113/ https://www.ncbi.nlm.nih.gov/pubmed/19563679 http://dx.doi.org/10.1186/1471-2407-9-214
work_keys_str_mv	AT mosleyjonathand intrinsicbiasinbreastcancergeneexpressiondatasets AT kerirutha intrinsicbiasinbreastcancergeneexpressiondatasets

Intrinsic bias in breast cancer gene expression data sets

Ejemplares similares