Cargando…
A machine learning-based clinical tool for diagnosing myopathy using multi-cohort microarray expression profiles
BACKGROUND: Myopathies are a heterogenous collection of disorders characterized by dysfunction of skeletal muscle. In practice, myopathies are frequently encountered by physicians and precise diagnosis remains a challenge in primary care. Molecular expression profiles show promise for disease diagno...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7708151/ https://www.ncbi.nlm.nih.gov/pubmed/33256785 http://dx.doi.org/10.1186/s12967-020-02630-3 |
_version_ | 1783617505311326208 |
---|---|
author | Tran, Andrew Walsh, Chris J. Batt, Jane dos Santos, Claudia C. Hu, Pingzhao |
author_facet | Tran, Andrew Walsh, Chris J. Batt, Jane dos Santos, Claudia C. Hu, Pingzhao |
author_sort | Tran, Andrew |
collection | PubMed |
description | BACKGROUND: Myopathies are a heterogenous collection of disorders characterized by dysfunction of skeletal muscle. In practice, myopathies are frequently encountered by physicians and precise diagnosis remains a challenge in primary care. Molecular expression profiles show promise for disease diagnosis in various pathologies. We propose a novel machine learning-based clinical tool for predicting muscle disease subtypes using multi-cohort microarray expression data. MATERIALS AND METHODS: Muscle tissue samples originating from 1260 patients with muscle weakness. Data was curated from 42 independent cohorts with expression profiles in public microarray gene expression repositories, which represent a broad range of patient ages and peripheral muscles. Cohorts were categorized into five muscle disease subtypes: immobility, inflammatory myopathies, intensive care unit acquired weakness (ICUAW), congenital, and chronic systemic disease. The data contains expression data on 34,099 genes. Data augmentation techniques were used to address class imbalances in the muscle disease subtypes. Support vector machine (SVM) models were trained on two-thirds of the 1260 samples based on the top selected gene signature using analysis of variance (ANOVA). The model was validated in the remaining samples using area under the receiver operator curve (AUC). Gene enrichment analysis was used to identify enriched biological functions in the gene signature. RESULTS: The AUC ranges from 0.611 to 0.649 in the observed imbalanced data. Overall, using the augmented data, chronic systemic disease was the best predicted class with AUC 0.872 (95% confidence interval (CI): 0.824–0.920). The least discriminated classes were ICUAW with AUC 0.777 (95% CI: 0.668–0.887) and immobility with AUC 0.789 (95% CI: 0.716–0.861). Disease-specific gene set enrichment results showed that the gene signature was enriched in biological processes including neural precursor cell proliferation for ICUAW and aerobic respiration for congenital (false discovery rate q-value < 0.001). CONCLUSION: Our results present a well-performing molecular classification tool with the selected gene markers for muscle disease classification. In practice, this tool addresses an important gap in the literature on myopathies and presents a potentially useful clinical tool for muscle disease subtype diagnosis. |
format | Online Article Text |
id | pubmed-7708151 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-77081512020-12-02 A machine learning-based clinical tool for diagnosing myopathy using multi-cohort microarray expression profiles Tran, Andrew Walsh, Chris J. Batt, Jane dos Santos, Claudia C. Hu, Pingzhao J Transl Med Research BACKGROUND: Myopathies are a heterogenous collection of disorders characterized by dysfunction of skeletal muscle. In practice, myopathies are frequently encountered by physicians and precise diagnosis remains a challenge in primary care. Molecular expression profiles show promise for disease diagnosis in various pathologies. We propose a novel machine learning-based clinical tool for predicting muscle disease subtypes using multi-cohort microarray expression data. MATERIALS AND METHODS: Muscle tissue samples originating from 1260 patients with muscle weakness. Data was curated from 42 independent cohorts with expression profiles in public microarray gene expression repositories, which represent a broad range of patient ages and peripheral muscles. Cohorts were categorized into five muscle disease subtypes: immobility, inflammatory myopathies, intensive care unit acquired weakness (ICUAW), congenital, and chronic systemic disease. The data contains expression data on 34,099 genes. Data augmentation techniques were used to address class imbalances in the muscle disease subtypes. Support vector machine (SVM) models were trained on two-thirds of the 1260 samples based on the top selected gene signature using analysis of variance (ANOVA). The model was validated in the remaining samples using area under the receiver operator curve (AUC). Gene enrichment analysis was used to identify enriched biological functions in the gene signature. RESULTS: The AUC ranges from 0.611 to 0.649 in the observed imbalanced data. Overall, using the augmented data, chronic systemic disease was the best predicted class with AUC 0.872 (95% confidence interval (CI): 0.824–0.920). The least discriminated classes were ICUAW with AUC 0.777 (95% CI: 0.668–0.887) and immobility with AUC 0.789 (95% CI: 0.716–0.861). Disease-specific gene set enrichment results showed that the gene signature was enriched in biological processes including neural precursor cell proliferation for ICUAW and aerobic respiration for congenital (false discovery rate q-value < 0.001). CONCLUSION: Our results present a well-performing molecular classification tool with the selected gene markers for muscle disease classification. In practice, this tool addresses an important gap in the literature on myopathies and presents a potentially useful clinical tool for muscle disease subtype diagnosis. BioMed Central 2020-11-30 /pmc/articles/PMC7708151/ /pubmed/33256785 http://dx.doi.org/10.1186/s12967-020-02630-3 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Tran, Andrew Walsh, Chris J. Batt, Jane dos Santos, Claudia C. Hu, Pingzhao A machine learning-based clinical tool for diagnosing myopathy using multi-cohort microarray expression profiles |
title | A machine learning-based clinical tool for diagnosing myopathy using multi-cohort microarray expression profiles |
title_full | A machine learning-based clinical tool for diagnosing myopathy using multi-cohort microarray expression profiles |
title_fullStr | A machine learning-based clinical tool for diagnosing myopathy using multi-cohort microarray expression profiles |
title_full_unstemmed | A machine learning-based clinical tool for diagnosing myopathy using multi-cohort microarray expression profiles |
title_short | A machine learning-based clinical tool for diagnosing myopathy using multi-cohort microarray expression profiles |
title_sort | machine learning-based clinical tool for diagnosing myopathy using multi-cohort microarray expression profiles |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7708151/ https://www.ncbi.nlm.nih.gov/pubmed/33256785 http://dx.doi.org/10.1186/s12967-020-02630-3 |
work_keys_str_mv | AT tranandrew amachinelearningbasedclinicaltoolfordiagnosingmyopathyusingmulticohortmicroarrayexpressionprofiles AT walshchrisj amachinelearningbasedclinicaltoolfordiagnosingmyopathyusingmulticohortmicroarrayexpressionprofiles AT battjane amachinelearningbasedclinicaltoolfordiagnosingmyopathyusingmulticohortmicroarrayexpressionprofiles AT dossantosclaudiac amachinelearningbasedclinicaltoolfordiagnosingmyopathyusingmulticohortmicroarrayexpressionprofiles AT hupingzhao amachinelearningbasedclinicaltoolfordiagnosingmyopathyusingmulticohortmicroarrayexpressionprofiles AT tranandrew machinelearningbasedclinicaltoolfordiagnosingmyopathyusingmulticohortmicroarrayexpressionprofiles AT walshchrisj machinelearningbasedclinicaltoolfordiagnosingmyopathyusingmulticohortmicroarrayexpressionprofiles AT battjane machinelearningbasedclinicaltoolfordiagnosingmyopathyusingmulticohortmicroarrayexpressionprofiles AT dossantosclaudiac machinelearningbasedclinicaltoolfordiagnosingmyopathyusingmulticohortmicroarrayexpressionprofiles AT hupingzhao machinelearningbasedclinicaltoolfordiagnosingmyopathyusingmulticohortmicroarrayexpressionprofiles |