Cargando…

Machine learning approaches to predict lupus disease activity from gene expression data

The integration of gene expression data to predict systemic lupus erythematosus (SLE) disease activity is a significant challenge because of the high degree of heterogeneity among patients and study cohorts, especially those collected on different microarray platforms. Here we deployed machine learn...

Descripción completa

Detalles Bibliográficos
Autores principales: Kegerreis, Brian, Catalina, Michelle D., Bachali, Prathyusha, Geraci, Nicholas S., Labonte, Adam C., Zeng, Chen, Stearrett, Nathaniel, Crandall, Keith A., Lipsky, Peter E., Grammer, Amrie C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6610624/
https://www.ncbi.nlm.nih.gov/pubmed/31270349
http://dx.doi.org/10.1038/s41598-019-45989-0
_version_ 1783432537270386688
author Kegerreis, Brian
Catalina, Michelle D.
Bachali, Prathyusha
Geraci, Nicholas S.
Labonte, Adam C.
Zeng, Chen
Stearrett, Nathaniel
Crandall, Keith A.
Lipsky, Peter E.
Grammer, Amrie C.
author_facet Kegerreis, Brian
Catalina, Michelle D.
Bachali, Prathyusha
Geraci, Nicholas S.
Labonte, Adam C.
Zeng, Chen
Stearrett, Nathaniel
Crandall, Keith A.
Lipsky, Peter E.
Grammer, Amrie C.
author_sort Kegerreis, Brian
collection PubMed
description The integration of gene expression data to predict systemic lupus erythematosus (SLE) disease activity is a significant challenge because of the high degree of heterogeneity among patients and study cohorts, especially those collected on different microarray platforms. Here we deployed machine learning approaches to integrate gene expression data from three SLE data sets and used it to classify patients as having active or inactive disease as characterized by standard clinical composite outcome measures. Both raw whole blood gene expression data and informative gene modules generated by Weighted Gene Co-expression Network Analysis from purified leukocyte populations were employed with various classification algorithms. Classifiers were evaluated by 10-fold cross-validation across three combined data sets or by training and testing in independent data sets, the latter of which amplified the effects of technical variation. A random forest classifier achieved a peak classification accuracy of 83 percent under 10-fold cross-validation, but its performance could be severely affected by technical variation among data sets. The use of gene modules rather than raw gene expression was more robust, achieving classification accuracies of approximately 70 percent regardless of how the training and testing sets were formed. Fine-tuning the algorithms and parameter sets may generate sufficient accuracy to be informative as a standalone estimate of disease activity.
format Online
Article
Text
id pubmed-6610624
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-66106242019-07-15 Machine learning approaches to predict lupus disease activity from gene expression data Kegerreis, Brian Catalina, Michelle D. Bachali, Prathyusha Geraci, Nicholas S. Labonte, Adam C. Zeng, Chen Stearrett, Nathaniel Crandall, Keith A. Lipsky, Peter E. Grammer, Amrie C. Sci Rep Article The integration of gene expression data to predict systemic lupus erythematosus (SLE) disease activity is a significant challenge because of the high degree of heterogeneity among patients and study cohorts, especially those collected on different microarray platforms. Here we deployed machine learning approaches to integrate gene expression data from three SLE data sets and used it to classify patients as having active or inactive disease as characterized by standard clinical composite outcome measures. Both raw whole blood gene expression data and informative gene modules generated by Weighted Gene Co-expression Network Analysis from purified leukocyte populations were employed with various classification algorithms. Classifiers were evaluated by 10-fold cross-validation across three combined data sets or by training and testing in independent data sets, the latter of which amplified the effects of technical variation. A random forest classifier achieved a peak classification accuracy of 83 percent under 10-fold cross-validation, but its performance could be severely affected by technical variation among data sets. The use of gene modules rather than raw gene expression was more robust, achieving classification accuracies of approximately 70 percent regardless of how the training and testing sets were formed. Fine-tuning the algorithms and parameter sets may generate sufficient accuracy to be informative as a standalone estimate of disease activity. Nature Publishing Group UK 2019-07-03 /pmc/articles/PMC6610624/ /pubmed/31270349 http://dx.doi.org/10.1038/s41598-019-45989-0 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Kegerreis, Brian
Catalina, Michelle D.
Bachali, Prathyusha
Geraci, Nicholas S.
Labonte, Adam C.
Zeng, Chen
Stearrett, Nathaniel
Crandall, Keith A.
Lipsky, Peter E.
Grammer, Amrie C.
Machine learning approaches to predict lupus disease activity from gene expression data
title Machine learning approaches to predict lupus disease activity from gene expression data
title_full Machine learning approaches to predict lupus disease activity from gene expression data
title_fullStr Machine learning approaches to predict lupus disease activity from gene expression data
title_full_unstemmed Machine learning approaches to predict lupus disease activity from gene expression data
title_short Machine learning approaches to predict lupus disease activity from gene expression data
title_sort machine learning approaches to predict lupus disease activity from gene expression data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6610624/
https://www.ncbi.nlm.nih.gov/pubmed/31270349
http://dx.doi.org/10.1038/s41598-019-45989-0
work_keys_str_mv AT kegerreisbrian machinelearningapproachestopredictlupusdiseaseactivityfromgeneexpressiondata
AT catalinamichelled machinelearningapproachestopredictlupusdiseaseactivityfromgeneexpressiondata
AT bachaliprathyusha machinelearningapproachestopredictlupusdiseaseactivityfromgeneexpressiondata
AT geracinicholass machinelearningapproachestopredictlupusdiseaseactivityfromgeneexpressiondata
AT labonteadamc machinelearningapproachestopredictlupusdiseaseactivityfromgeneexpressiondata
AT zengchen machinelearningapproachestopredictlupusdiseaseactivityfromgeneexpressiondata
AT stearrettnathaniel machinelearningapproachestopredictlupusdiseaseactivityfromgeneexpressiondata
AT crandallkeitha machinelearningapproachestopredictlupusdiseaseactivityfromgeneexpressiondata
AT lipskypetere machinelearningapproachestopredictlupusdiseaseactivityfromgeneexpressiondata
AT grammeramriec machinelearningapproachestopredictlupusdiseaseactivityfromgeneexpressiondata