Cargando…

Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes

BACKGROUND: Due to the high cost and low reproducibility of many microarray experiments, it is not surprising to find a limited number of patient samples in each study, and very few common identified marker genes among different studies involving patients with the same disease. Therefore, it is of g...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Hongying, Deng, Youping, Chen, Huann-Sheng, Tao, Lin, Sha, Qiuying, Chen, Jun, Tsai, Chung-Jui, Zhang, Shuanglin
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC476733/
https://www.ncbi.nlm.nih.gov/pubmed/15217521
http://dx.doi.org/10.1186/1471-2105-5-81
_version_ 1782121634671362048
author Jiang, Hongying
Deng, Youping
Chen, Huann-Sheng
Tao, Lin
Sha, Qiuying
Chen, Jun
Tsai, Chung-Jui
Zhang, Shuanglin
author_facet Jiang, Hongying
Deng, Youping
Chen, Huann-Sheng
Tao, Lin
Sha, Qiuying
Chen, Jun
Tsai, Chung-Jui
Zhang, Shuanglin
author_sort Jiang, Hongying
collection PubMed
description BACKGROUND: Due to the high cost and low reproducibility of many microarray experiments, it is not surprising to find a limited number of patient samples in each study, and very few common identified marker genes among different studies involving patients with the same disease. Therefore, it is of great interest and challenge to merge data sets from multiple studies to increase the sample size, which may in turn increase the power of statistical inferences. In this study, we combined two lung cancer studies using micorarray GeneChip(®), employed two gene shaving methods and a two-step survival test to identify genes with expression patterns that can distinguish diseased from normal samples, and to indicate patient survival, respectively. RESULTS: In addition to common data transformation and normalization procedures, we applied a distribution transformation method to integrate the two data sets. Gene shaving (GS) methods based on Random Forests (RF) and Fisher's Linear Discrimination (FLD) were then applied separately to the joint data set for cancer gene selection. The two methods discovered 13 and 10 marker genes (5 in common), respectively, with expression patterns differentiating diseased from normal samples. Among these marker genes, 8 and 7 were found to be cancer-related in other published reports. Furthermore, based on these marker genes, the classifiers we built from one data set predicted the other data set with more than 98% accuracy. Using the univariate Cox proportional hazard regression model, the expression patterns of 36 genes were found to be significantly correlated with patient survival (p < 0.05). Twenty-six of these 36 genes were reported as survival-related genes from the literature, including 7 known tumor-suppressor genes and 9 oncogenes. Additional principal component regression analysis further reduced the gene list from 36 to 16. CONCLUSION: This study provided a valuable method of integrating microarray data sets with different origins, and new methods of selecting a minimum number of marker genes to aid in cancer diagnosis. After careful data integration, the classification method developed from one data set can be applied to the other with high prediction accuracy.
format Text
id pubmed-476733
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-4767332004-07-18 Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes Jiang, Hongying Deng, Youping Chen, Huann-Sheng Tao, Lin Sha, Qiuying Chen, Jun Tsai, Chung-Jui Zhang, Shuanglin BMC Bioinformatics Methodology Article BACKGROUND: Due to the high cost and low reproducibility of many microarray experiments, it is not surprising to find a limited number of patient samples in each study, and very few common identified marker genes among different studies involving patients with the same disease. Therefore, it is of great interest and challenge to merge data sets from multiple studies to increase the sample size, which may in turn increase the power of statistical inferences. In this study, we combined two lung cancer studies using micorarray GeneChip(®), employed two gene shaving methods and a two-step survival test to identify genes with expression patterns that can distinguish diseased from normal samples, and to indicate patient survival, respectively. RESULTS: In addition to common data transformation and normalization procedures, we applied a distribution transformation method to integrate the two data sets. Gene shaving (GS) methods based on Random Forests (RF) and Fisher's Linear Discrimination (FLD) were then applied separately to the joint data set for cancer gene selection. The two methods discovered 13 and 10 marker genes (5 in common), respectively, with expression patterns differentiating diseased from normal samples. Among these marker genes, 8 and 7 were found to be cancer-related in other published reports. Furthermore, based on these marker genes, the classifiers we built from one data set predicted the other data set with more than 98% accuracy. Using the univariate Cox proportional hazard regression model, the expression patterns of 36 genes were found to be significantly correlated with patient survival (p < 0.05). Twenty-six of these 36 genes were reported as survival-related genes from the literature, including 7 known tumor-suppressor genes and 9 oncogenes. Additional principal component regression analysis further reduced the gene list from 36 to 16. CONCLUSION: This study provided a valuable method of integrating microarray data sets with different origins, and new methods of selecting a minimum number of marker genes to aid in cancer diagnosis. After careful data integration, the classification method developed from one data set can be applied to the other with high prediction accuracy. BioMed Central 2004-06-24 /pmc/articles/PMC476733/ /pubmed/15217521 http://dx.doi.org/10.1186/1471-2105-5-81 Text en Copyright © 2004 Jiang et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Methodology Article
Jiang, Hongying
Deng, Youping
Chen, Huann-Sheng
Tao, Lin
Sha, Qiuying
Chen, Jun
Tsai, Chung-Jui
Zhang, Shuanglin
Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes
title Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes
title_full Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes
title_fullStr Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes
title_full_unstemmed Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes
title_short Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes
title_sort joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC476733/
https://www.ncbi.nlm.nih.gov/pubmed/15217521
http://dx.doi.org/10.1186/1471-2105-5-81
work_keys_str_mv AT jianghongying jointanalysisoftwomicroarraygeneexpressiondatasetstoselectlungadenocarcinomamarkergenes
AT dengyouping jointanalysisoftwomicroarraygeneexpressiondatasetstoselectlungadenocarcinomamarkergenes
AT chenhuannsheng jointanalysisoftwomicroarraygeneexpressiondatasetstoselectlungadenocarcinomamarkergenes
AT taolin jointanalysisoftwomicroarraygeneexpressiondatasetstoselectlungadenocarcinomamarkergenes
AT shaqiuying jointanalysisoftwomicroarraygeneexpressiondatasetstoselectlungadenocarcinomamarkergenes
AT chenjun jointanalysisoftwomicroarraygeneexpressiondatasetstoselectlungadenocarcinomamarkergenes
AT tsaichungjui jointanalysisoftwomicroarraygeneexpressiondatasetstoselectlungadenocarcinomamarkergenes
AT zhangshuanglin jointanalysisoftwomicroarraygeneexpressiondatasetstoselectlungadenocarcinomamarkergenes