Cargando…

Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer

MOTIVATION: Individual microarray studies searching for prognostic biomarkers often have few samples and low statistical power; however, publicly accessible data sets make it possible to combine data across studies. METHOD: We present a novel approach for combining microarray data across institution...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Jing, Do, Kim Anh, Wen, Sijin, Tsavachidis, Spyros, McDonnell, Timothy J., Logothetis, Christopher J., Coombes, Kevin R.
Formato: Texto
Lenguaje:English
Publicado: Libertas Academica 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675498/
https://www.ncbi.nlm.nih.gov/pubmed/19458761
_version_ 1782166701817724928
author Wang, Jing
Do, Kim Anh
Wen, Sijin
Tsavachidis, Spyros
McDonnell, Timothy J.
Logothetis, Christopher J.
Coombes, Kevin R.
author_facet Wang, Jing
Do, Kim Anh
Wen, Sijin
Tsavachidis, Spyros
McDonnell, Timothy J.
Logothetis, Christopher J.
Coombes, Kevin R.
author_sort Wang, Jing
collection PubMed
description MOTIVATION: Individual microarray studies searching for prognostic biomarkers often have few samples and low statistical power; however, publicly accessible data sets make it possible to combine data across studies. METHOD: We present a novel approach for combining microarray data across institutions and platforms. We introduce a new algorithm, robust greedy feature selection (RGFS), to select predictive genes. RESULTS: We combined two prostate cancer microarray data sets, confirmed the appropriateness of the approach with the Kolmogorov-Smirnov goodness-of-fit test, and built several predictive models. The best logistic regression model with stepwise forward selection used 7 genes and had a misclassification rate of 31%. Models that combined LDA with different feature selection algorithms had misclassification rates between 19% and 33%, and the sets of genes in the models varied substantially during cross-validation. When we combined RGFS with LDA, the best model used two genes and had a misclassification rate of 15%. AVAILABILITY: Affymetrix U95Av2 array data are available at http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi. The cDNA microarray data are available through the Stanford Microarray Database (http://cmgm.stanford.edu/pbrown/). GeneLink software is freely available at http://bioinformatics.mdanderson.org/GeneLink/. DNA-Chip Analyzer software is publicly available at http://biosun1.harvard.edu/complab/dchip/.
format Text
id pubmed-2675498
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-26754982009-05-20 Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer Wang, Jing Do, Kim Anh Wen, Sijin Tsavachidis, Spyros McDonnell, Timothy J. Logothetis, Christopher J. Coombes, Kevin R. Cancer Inform Original Research MOTIVATION: Individual microarray studies searching for prognostic biomarkers often have few samples and low statistical power; however, publicly accessible data sets make it possible to combine data across studies. METHOD: We present a novel approach for combining microarray data across institutions and platforms. We introduce a new algorithm, robust greedy feature selection (RGFS), to select predictive genes. RESULTS: We combined two prostate cancer microarray data sets, confirmed the appropriateness of the approach with the Kolmogorov-Smirnov goodness-of-fit test, and built several predictive models. The best logistic regression model with stepwise forward selection used 7 genes and had a misclassification rate of 31%. Models that combined LDA with different feature selection algorithms had misclassification rates between 19% and 33%, and the sets of genes in the models varied substantially during cross-validation. When we combined RGFS with LDA, the best model used two genes and had a misclassification rate of 15%. AVAILABILITY: Affymetrix U95Av2 array data are available at http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi. The cDNA microarray data are available through the Stanford Microarray Database (http://cmgm.stanford.edu/pbrown/). GeneLink software is freely available at http://bioinformatics.mdanderson.org/GeneLink/. DNA-Chip Analyzer software is publicly available at http://biosun1.harvard.edu/complab/dchip/. Libertas Academica 2007-02-14 /pmc/articles/PMC2675498/ /pubmed/19458761 Text en © 2006 The authors. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
spellingShingle Original Research
Wang, Jing
Do, Kim Anh
Wen, Sijin
Tsavachidis, Spyros
McDonnell, Timothy J.
Logothetis, Christopher J.
Coombes, Kevin R.
Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer
title Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer
title_full Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer
title_fullStr Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer
title_full_unstemmed Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer
title_short Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer
title_sort merging microarray data, robust feature selection, and predicting prognosis in prostate cancer
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675498/
https://www.ncbi.nlm.nih.gov/pubmed/19458761
work_keys_str_mv AT wangjing mergingmicroarraydatarobustfeatureselectionandpredictingprognosisinprostatecancer
AT dokimanh mergingmicroarraydatarobustfeatureselectionandpredictingprognosisinprostatecancer
AT wensijin mergingmicroarraydatarobustfeatureselectionandpredictingprognosisinprostatecancer
AT tsavachidisspyros mergingmicroarraydatarobustfeatureselectionandpredictingprognosisinprostatecancer
AT mcdonnelltimothyj mergingmicroarraydatarobustfeatureselectionandpredictingprognosisinprostatecancer
AT logothetischristopherj mergingmicroarraydatarobustfeatureselectionandpredictingprognosisinprostatecancer
AT coombeskevinr mergingmicroarraydatarobustfeatureselectionandpredictingprognosisinprostatecancer