Cargando…

Combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia

BACKGROUND: This paper presents a retrospective statistical study on the newly-released data set by the Stanley Neuropathology Consortium on gene expression in bipolar disorder and schizophrenia. This data set contains gene expression data as well as limited demographic and clinical data for each su...

Descripción completa

Detalles Bibliográficos
Autores principales: Struyf, Jan, Dobrin, Seth, Page, David
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2628394/
https://www.ncbi.nlm.nih.gov/pubmed/18992130
http://dx.doi.org/10.1186/1471-2164-9-531
_version_ 1782163693908262912
author Struyf, Jan
Dobrin, Seth
Page, David
author_facet Struyf, Jan
Dobrin, Seth
Page, David
author_sort Struyf, Jan
collection PubMed
description BACKGROUND: This paper presents a retrospective statistical study on the newly-released data set by the Stanley Neuropathology Consortium on gene expression in bipolar disorder and schizophrenia. This data set contains gene expression data as well as limited demographic and clinical data for each subject. Previous studies using statistical classification or machine learning algorithms have focused on gene expression data only. The present paper investigates if such techniques can benefit from including demographic and clinical data. RESULTS: We compare six classification algorithms: support vector machines (SVMs), nearest shrunken centroids, decision trees, ensemble of voters, naïve Bayes, and nearest neighbor. SVMs outperform the other algorithms. Using expression data only, they yield an area under the ROC curve of 0.92 for bipolar disorder versus control, and 0.91 for schizophrenia versus control. By including demographic and clinical data, classification performance improves to 0.97 and 0.94 respectively. CONCLUSION: This paper demonstrates that SVMs can distinguish bipolar disorder and schizophrenia from normal control at a very high rate. Moreover, it shows that classification performance improves by including demographic and clinical data. We also found that some variables in this data set, such as alcohol and drug use, are strongly associated to the diseases. These variables may affect gene expression and make it more difficult to identify genes that are directly associated to the diseases. Stratification can correct for such variables, but we show that this reduces the power of the statistical methods.
format Text
id pubmed-2628394
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26283942009-01-21 Combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia Struyf, Jan Dobrin, Seth Page, David BMC Genomics Research Article BACKGROUND: This paper presents a retrospective statistical study on the newly-released data set by the Stanley Neuropathology Consortium on gene expression in bipolar disorder and schizophrenia. This data set contains gene expression data as well as limited demographic and clinical data for each subject. Previous studies using statistical classification or machine learning algorithms have focused on gene expression data only. The present paper investigates if such techniques can benefit from including demographic and clinical data. RESULTS: We compare six classification algorithms: support vector machines (SVMs), nearest shrunken centroids, decision trees, ensemble of voters, naïve Bayes, and nearest neighbor. SVMs outperform the other algorithms. Using expression data only, they yield an area under the ROC curve of 0.92 for bipolar disorder versus control, and 0.91 for schizophrenia versus control. By including demographic and clinical data, classification performance improves to 0.97 and 0.94 respectively. CONCLUSION: This paper demonstrates that SVMs can distinguish bipolar disorder and schizophrenia from normal control at a very high rate. Moreover, it shows that classification performance improves by including demographic and clinical data. We also found that some variables in this data set, such as alcohol and drug use, are strongly associated to the diseases. These variables may affect gene expression and make it more difficult to identify genes that are directly associated to the diseases. Stratification can correct for such variables, but we show that this reduces the power of the statistical methods. BioMed Central 2008-11-07 /pmc/articles/PMC2628394/ /pubmed/18992130 http://dx.doi.org/10.1186/1471-2164-9-531 Text en Copyright © 2008 Struyf et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Struyf, Jan
Dobrin, Seth
Page, David
Combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia
title Combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia
title_full Combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia
title_fullStr Combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia
title_full_unstemmed Combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia
title_short Combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia
title_sort combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2628394/
https://www.ncbi.nlm.nih.gov/pubmed/18992130
http://dx.doi.org/10.1186/1471-2164-9-531
work_keys_str_mv AT struyfjan combininggeneexpressiondemographicandclinicaldatainmodelingdiseaseacasestudyofbipolardisorderandschizophrenia
AT dobrinseth combininggeneexpressiondemographicandclinicaldatainmodelingdiseaseacasestudyofbipolardisorderandschizophrenia
AT pagedavid combininggeneexpressiondemographicandclinicaldatainmodelingdiseaseacasestudyofbipolardisorderandschizophrenia