Cargando…

Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin

To better understand the molecular basis of respiratory diseases of viral origin, high-throughput gene-expression data are frequently taken by means of DNA microarray or RNA-seq technology. Such data can also be useful to classify infected individuals by molecular signatures in the form of machine-l...

Descripción completa

Detalles Bibliográficos
Autores principales: Kircher, Magdalena, Chludzinski, Elisa, Krepel, Jessica, Saremi, Babak, Beineke, Andreas, Jung, Klaus
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8910329/
https://www.ncbi.nlm.nih.gov/pubmed/35269624
http://dx.doi.org/10.3390/ijms23052481
_version_ 1784666445305610240
author Kircher, Magdalena
Chludzinski, Elisa
Krepel, Jessica
Saremi, Babak
Beineke, Andreas
Jung, Klaus
author_facet Kircher, Magdalena
Chludzinski, Elisa
Krepel, Jessica
Saremi, Babak
Beineke, Andreas
Jung, Klaus
author_sort Kircher, Magdalena
collection PubMed
description To better understand the molecular basis of respiratory diseases of viral origin, high-throughput gene-expression data are frequently taken by means of DNA microarray or RNA-seq technology. Such data can also be useful to classify infected individuals by molecular signatures in the form of machine-learning models with genes as predictor variables. Early diagnosis of patients by molecular signatures could also contribute to better treatments. An approach that has rarely been considered for machine-learning models in the context of transcriptomics is data augmentation. For other data types it has been shown that augmentation can improve classification accuracy and prevent overfitting. Here, we compare three strategies for data augmentation of DNA microarray and RNA-seq data from two selected studies on respiratory diseases of viral origin. The first study involves samples of patients with either viral or bacterial origin of the respiratory disease, the second study involves patients with either SARS-CoV-2 or another respiratory virus as disease origin. Specifically, we reanalyze these public datasets to study whether patient classification by transcriptomic signatures can be improved when adding artificial data for training of the machine-learning models. Our comparison reveals that augmentation of transcriptomic data can improve the classification accuracy and that fewer genes are necessary as explanatory variables in the final models. We also report genes from our signatures that overlap with signatures presented in the original publications of our example data. Due to strict selection criteria, the molecular role of these genes in the context of respiratory infectious diseases is underlined.
format Online
Article
Text
id pubmed-8910329
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-89103292022-03-11 Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin Kircher, Magdalena Chludzinski, Elisa Krepel, Jessica Saremi, Babak Beineke, Andreas Jung, Klaus Int J Mol Sci Article To better understand the molecular basis of respiratory diseases of viral origin, high-throughput gene-expression data are frequently taken by means of DNA microarray or RNA-seq technology. Such data can also be useful to classify infected individuals by molecular signatures in the form of machine-learning models with genes as predictor variables. Early diagnosis of patients by molecular signatures could also contribute to better treatments. An approach that has rarely been considered for machine-learning models in the context of transcriptomics is data augmentation. For other data types it has been shown that augmentation can improve classification accuracy and prevent overfitting. Here, we compare three strategies for data augmentation of DNA microarray and RNA-seq data from two selected studies on respiratory diseases of viral origin. The first study involves samples of patients with either viral or bacterial origin of the respiratory disease, the second study involves patients with either SARS-CoV-2 or another respiratory virus as disease origin. Specifically, we reanalyze these public datasets to study whether patient classification by transcriptomic signatures can be improved when adding artificial data for training of the machine-learning models. Our comparison reveals that augmentation of transcriptomic data can improve the classification accuracy and that fewer genes are necessary as explanatory variables in the final models. We also report genes from our signatures that overlap with signatures presented in the original publications of our example data. Due to strict selection criteria, the molecular role of these genes in the context of respiratory infectious diseases is underlined. MDPI 2022-02-24 /pmc/articles/PMC8910329/ /pubmed/35269624 http://dx.doi.org/10.3390/ijms23052481 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kircher, Magdalena
Chludzinski, Elisa
Krepel, Jessica
Saremi, Babak
Beineke, Andreas
Jung, Klaus
Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin
title Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin
title_full Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin
title_fullStr Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin
title_full_unstemmed Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin
title_short Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin
title_sort augmentation of transcriptomic data for improved classification of patients with respiratory diseases of viral origin
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8910329/
https://www.ncbi.nlm.nih.gov/pubmed/35269624
http://dx.doi.org/10.3390/ijms23052481
work_keys_str_mv AT kirchermagdalena augmentationoftranscriptomicdataforimprovedclassificationofpatientswithrespiratorydiseasesofviralorigin
AT chludzinskielisa augmentationoftranscriptomicdataforimprovedclassificationofpatientswithrespiratorydiseasesofviralorigin
AT krepeljessica augmentationoftranscriptomicdataforimprovedclassificationofpatientswithrespiratorydiseasesofviralorigin
AT saremibabak augmentationoftranscriptomicdataforimprovedclassificationofpatientswithrespiratorydiseasesofviralorigin
AT beinekeandreas augmentationoftranscriptomicdataforimprovedclassificationofpatientswithrespiratorydiseasesofviralorigin
AT jungklaus augmentationoftranscriptomicdataforimprovedclassificationofpatientswithrespiratorydiseasesofviralorigin