Cargando…

A machine learning-based clinical tool for diagnosing myopathy using multi-cohort microarray expression profiles

BACKGROUND: Myopathies are a heterogenous collection of disorders characterized by dysfunction of skeletal muscle. In practice, myopathies are frequently encountered by physicians and precise diagnosis remains a challenge in primary care. Molecular expression profiles show promise for disease diagno...

Descripción completa

Detalles Bibliográficos
Autores principales: Tran, Andrew, Walsh, Chris J., Batt, Jane, dos Santos, Claudia C., Hu, Pingzhao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7708151/
https://www.ncbi.nlm.nih.gov/pubmed/33256785
http://dx.doi.org/10.1186/s12967-020-02630-3
_version_ 1783617505311326208
author Tran, Andrew
Walsh, Chris J.
Batt, Jane
dos Santos, Claudia C.
Hu, Pingzhao
author_facet Tran, Andrew
Walsh, Chris J.
Batt, Jane
dos Santos, Claudia C.
Hu, Pingzhao
author_sort Tran, Andrew
collection PubMed
description BACKGROUND: Myopathies are a heterogenous collection of disorders characterized by dysfunction of skeletal muscle. In practice, myopathies are frequently encountered by physicians and precise diagnosis remains a challenge in primary care. Molecular expression profiles show promise for disease diagnosis in various pathologies. We propose a novel machine learning-based clinical tool for predicting muscle disease subtypes using multi-cohort microarray expression data. MATERIALS AND METHODS: Muscle tissue samples originating from 1260 patients with muscle weakness. Data was curated from 42 independent cohorts with expression profiles in public microarray gene expression repositories, which represent a broad range of patient ages and peripheral muscles. Cohorts were categorized into five muscle disease subtypes: immobility, inflammatory myopathies, intensive care unit acquired weakness (ICUAW), congenital, and chronic systemic disease. The data contains expression data on 34,099 genes. Data augmentation techniques were used to address class imbalances in the muscle disease subtypes. Support vector machine (SVM) models were trained on two-thirds of the 1260 samples based on the top selected gene signature using analysis of variance (ANOVA). The model was validated in the remaining samples using area under the receiver operator curve (AUC). Gene enrichment analysis was used to identify enriched biological functions in the gene signature. RESULTS: The AUC ranges from 0.611 to 0.649 in the observed imbalanced data. Overall, using the augmented data, chronic systemic disease was the best predicted class with AUC 0.872 (95% confidence interval (CI): 0.824–0.920). The least discriminated classes were ICUAW with AUC 0.777 (95% CI: 0.668–0.887) and immobility with AUC 0.789 (95% CI: 0.716–0.861). Disease-specific gene set enrichment results showed that the gene signature was enriched in biological processes including neural precursor cell proliferation for ICUAW and aerobic respiration for congenital (false discovery rate q-value < 0.001). CONCLUSION: Our results present a well-performing molecular classification tool with the selected gene markers for muscle disease classification. In practice, this tool addresses an important gap in the literature on myopathies and presents a potentially useful clinical tool for muscle disease subtype diagnosis.
format Online
Article
Text
id pubmed-7708151
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-77081512020-12-02 A machine learning-based clinical tool for diagnosing myopathy using multi-cohort microarray expression profiles Tran, Andrew Walsh, Chris J. Batt, Jane dos Santos, Claudia C. Hu, Pingzhao J Transl Med Research BACKGROUND: Myopathies are a heterogenous collection of disorders characterized by dysfunction of skeletal muscle. In practice, myopathies are frequently encountered by physicians and precise diagnosis remains a challenge in primary care. Molecular expression profiles show promise for disease diagnosis in various pathologies. We propose a novel machine learning-based clinical tool for predicting muscle disease subtypes using multi-cohort microarray expression data. MATERIALS AND METHODS: Muscle tissue samples originating from 1260 patients with muscle weakness. Data was curated from 42 independent cohorts with expression profiles in public microarray gene expression repositories, which represent a broad range of patient ages and peripheral muscles. Cohorts were categorized into five muscle disease subtypes: immobility, inflammatory myopathies, intensive care unit acquired weakness (ICUAW), congenital, and chronic systemic disease. The data contains expression data on 34,099 genes. Data augmentation techniques were used to address class imbalances in the muscle disease subtypes. Support vector machine (SVM) models were trained on two-thirds of the 1260 samples based on the top selected gene signature using analysis of variance (ANOVA). The model was validated in the remaining samples using area under the receiver operator curve (AUC). Gene enrichment analysis was used to identify enriched biological functions in the gene signature. RESULTS: The AUC ranges from 0.611 to 0.649 in the observed imbalanced data. Overall, using the augmented data, chronic systemic disease was the best predicted class with AUC 0.872 (95% confidence interval (CI): 0.824–0.920). The least discriminated classes were ICUAW with AUC 0.777 (95% CI: 0.668–0.887) and immobility with AUC 0.789 (95% CI: 0.716–0.861). Disease-specific gene set enrichment results showed that the gene signature was enriched in biological processes including neural precursor cell proliferation for ICUAW and aerobic respiration for congenital (false discovery rate q-value < 0.001). CONCLUSION: Our results present a well-performing molecular classification tool with the selected gene markers for muscle disease classification. In practice, this tool addresses an important gap in the literature on myopathies and presents a potentially useful clinical tool for muscle disease subtype diagnosis. BioMed Central 2020-11-30 /pmc/articles/PMC7708151/ /pubmed/33256785 http://dx.doi.org/10.1186/s12967-020-02630-3 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Tran, Andrew
Walsh, Chris J.
Batt, Jane
dos Santos, Claudia C.
Hu, Pingzhao
A machine learning-based clinical tool for diagnosing myopathy using multi-cohort microarray expression profiles
title A machine learning-based clinical tool for diagnosing myopathy using multi-cohort microarray expression profiles
title_full A machine learning-based clinical tool for diagnosing myopathy using multi-cohort microarray expression profiles
title_fullStr A machine learning-based clinical tool for diagnosing myopathy using multi-cohort microarray expression profiles
title_full_unstemmed A machine learning-based clinical tool for diagnosing myopathy using multi-cohort microarray expression profiles
title_short A machine learning-based clinical tool for diagnosing myopathy using multi-cohort microarray expression profiles
title_sort machine learning-based clinical tool for diagnosing myopathy using multi-cohort microarray expression profiles
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7708151/
https://www.ncbi.nlm.nih.gov/pubmed/33256785
http://dx.doi.org/10.1186/s12967-020-02630-3
work_keys_str_mv AT tranandrew amachinelearningbasedclinicaltoolfordiagnosingmyopathyusingmulticohortmicroarrayexpressionprofiles
AT walshchrisj amachinelearningbasedclinicaltoolfordiagnosingmyopathyusingmulticohortmicroarrayexpressionprofiles
AT battjane amachinelearningbasedclinicaltoolfordiagnosingmyopathyusingmulticohortmicroarrayexpressionprofiles
AT dossantosclaudiac amachinelearningbasedclinicaltoolfordiagnosingmyopathyusingmulticohortmicroarrayexpressionprofiles
AT hupingzhao amachinelearningbasedclinicaltoolfordiagnosingmyopathyusingmulticohortmicroarrayexpressionprofiles
AT tranandrew machinelearningbasedclinicaltoolfordiagnosingmyopathyusingmulticohortmicroarrayexpressionprofiles
AT walshchrisj machinelearningbasedclinicaltoolfordiagnosingmyopathyusingmulticohortmicroarrayexpressionprofiles
AT battjane machinelearningbasedclinicaltoolfordiagnosingmyopathyusingmulticohortmicroarrayexpressionprofiles
AT dossantosclaudiac machinelearningbasedclinicaltoolfordiagnosingmyopathyusingmulticohortmicroarrayexpressionprofiles
AT hupingzhao machinelearningbasedclinicaltoolfordiagnosingmyopathyusingmulticohortmicroarrayexpressionprofiles