Cargando…

Comparative analysis of machine learning approaches for predicting respiratory virus infection and symptom severity

Respiratory diseases are among the major health problems causing a burden on hospitals. Diagnosis of infection and rapid prediction of severity without time-consuming clinical tests could be beneficial in preventing the spread and progression of the disease, especially in countries where health syst...

Descripción completa

Detalles Bibliográficos
Autores principales: Işık, Yunus Emre, Aydın, Zafer
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10317018/
https://www.ncbi.nlm.nih.gov/pubmed/37404475
http://dx.doi.org/10.7717/peerj.15552
_version_ 1785067825775247360
author Işık, Yunus Emre
Aydın, Zafer
author_facet Işık, Yunus Emre
Aydın, Zafer
author_sort Işık, Yunus Emre
collection PubMed
description Respiratory diseases are among the major health problems causing a burden on hospitals. Diagnosis of infection and rapid prediction of severity without time-consuming clinical tests could be beneficial in preventing the spread and progression of the disease, especially in countries where health systems remain incapable. Personalized medicine studies involving statistics and computer technologies could help to address this need. In addition to individual studies, competitions are also held such as Dialogue for Reverse Engineering Assessment and Methods (DREAM) challenge which is a community-driven organization with a mission to research biology, bioinformatics, and biomedicine. One of these competitions was the Respiratory Viral DREAM Challenge, which aimed to develop early predictive biomarkers for respiratory virus infections. These efforts are promising, however, the prediction performance of the computational methods developed for detecting respiratory diseases still has room for improvement. In this study, we focused on improving the performance of predicting the infection and symptom severity of individuals infected with various respiratory viruses using gene expression data collected before and after exposure. The publicly available gene expression dataset in the Gene Expression Omnibus, named GSE73072, containing samples exposed to four respiratory viruses (H1N1, H3N2, human rhinovirus (HRV), and respiratory syncytial virus (RSV)) was used as input data. Various preprocessing methods and machine learning algorithms were implemented and compared to achieve the best prediction performance. The experimental results showed that the proposed approaches obtained a prediction performance of 0.9746 area under the precision-recall curve (AUPRC) for infection (i.e., shedding) prediction (SC-1), 0.9182 AUPRC for symptom class prediction (SC-2), and 0.6733 Pearson correlation for symptom score prediction (SC-3) by outperforming the best leaderboard scores of Respiratory Viral DREAM Challenge (a 4.48% improvement for SC-1, a 13.68% improvement for SC-2, and a 13.98% improvement for SC-3). Additionally, over-representation analysis (ORA), which is a statistical method for objectively determining whether certain genes are more prevalent in pre-defined sets such as pathways, was applied using the most significant genes selected by feature selection methods. The results show that pathways associated with the ‘adaptive immune system’ and ‘immune disease’ are strongly linked to pre-infection and symptom development. These findings contribute to our knowledge about predicting respiratory infections and are expected to facilitate the development of future studies that concentrate on predicting not only infections but also the associated symptoms.
format Online
Article
Text
id pubmed-10317018
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-103170182023-07-04 Comparative analysis of machine learning approaches for predicting respiratory virus infection and symptom severity Işık, Yunus Emre Aydın, Zafer PeerJ Bioinformatics Respiratory diseases are among the major health problems causing a burden on hospitals. Diagnosis of infection and rapid prediction of severity without time-consuming clinical tests could be beneficial in preventing the spread and progression of the disease, especially in countries where health systems remain incapable. Personalized medicine studies involving statistics and computer technologies could help to address this need. In addition to individual studies, competitions are also held such as Dialogue for Reverse Engineering Assessment and Methods (DREAM) challenge which is a community-driven organization with a mission to research biology, bioinformatics, and biomedicine. One of these competitions was the Respiratory Viral DREAM Challenge, which aimed to develop early predictive biomarkers for respiratory virus infections. These efforts are promising, however, the prediction performance of the computational methods developed for detecting respiratory diseases still has room for improvement. In this study, we focused on improving the performance of predicting the infection and symptom severity of individuals infected with various respiratory viruses using gene expression data collected before and after exposure. The publicly available gene expression dataset in the Gene Expression Omnibus, named GSE73072, containing samples exposed to four respiratory viruses (H1N1, H3N2, human rhinovirus (HRV), and respiratory syncytial virus (RSV)) was used as input data. Various preprocessing methods and machine learning algorithms were implemented and compared to achieve the best prediction performance. The experimental results showed that the proposed approaches obtained a prediction performance of 0.9746 area under the precision-recall curve (AUPRC) for infection (i.e., shedding) prediction (SC-1), 0.9182 AUPRC for symptom class prediction (SC-2), and 0.6733 Pearson correlation for symptom score prediction (SC-3) by outperforming the best leaderboard scores of Respiratory Viral DREAM Challenge (a 4.48% improvement for SC-1, a 13.68% improvement for SC-2, and a 13.98% improvement for SC-3). Additionally, over-representation analysis (ORA), which is a statistical method for objectively determining whether certain genes are more prevalent in pre-defined sets such as pathways, was applied using the most significant genes selected by feature selection methods. The results show that pathways associated with the ‘adaptive immune system’ and ‘immune disease’ are strongly linked to pre-infection and symptom development. These findings contribute to our knowledge about predicting respiratory infections and are expected to facilitate the development of future studies that concentrate on predicting not only infections but also the associated symptoms. PeerJ Inc. 2023-06-30 /pmc/articles/PMC10317018/ /pubmed/37404475 http://dx.doi.org/10.7717/peerj.15552 Text en © 2023 Işık and Aydın https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Işık, Yunus Emre
Aydın, Zafer
Comparative analysis of machine learning approaches for predicting respiratory virus infection and symptom severity
title Comparative analysis of machine learning approaches for predicting respiratory virus infection and symptom severity
title_full Comparative analysis of machine learning approaches for predicting respiratory virus infection and symptom severity
title_fullStr Comparative analysis of machine learning approaches for predicting respiratory virus infection and symptom severity
title_full_unstemmed Comparative analysis of machine learning approaches for predicting respiratory virus infection and symptom severity
title_short Comparative analysis of machine learning approaches for predicting respiratory virus infection and symptom severity
title_sort comparative analysis of machine learning approaches for predicting respiratory virus infection and symptom severity
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10317018/
https://www.ncbi.nlm.nih.gov/pubmed/37404475
http://dx.doi.org/10.7717/peerj.15552
work_keys_str_mv AT isıkyunusemre comparativeanalysisofmachinelearningapproachesforpredictingrespiratoryvirusinfectionandsymptomseverity
AT aydınzafer comparativeanalysisofmachinelearningapproachesforpredictingrespiratoryvirusinfectionandsymptomseverity