Cargando…
An application based on bioinformatics and machine learning for risk prediction of sepsis at first clinical presentation using transcriptomic data
Background: Linking genotypic changes to phenotypic traits based on machine learning methods has various challenges. In this study, we developed a workflow based on bioinformatics and machine learning methods using transcriptomic data for sepsis obtained at the first clinical presentation for predic...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9490444/ https://www.ncbi.nlm.nih.gov/pubmed/36159979 http://dx.doi.org/10.3389/fgene.2022.979529 |
_version_ | 1784793087455789056 |
---|---|
author | Shi, Songchang Pan, Xiaobin Zhang, Lihui Wang, Xincai Zhuang, Yingfeng Lin, Xingsheng Shi, Songjing Zheng, Jianzhang Lin, Wei |
author_facet | Shi, Songchang Pan, Xiaobin Zhang, Lihui Wang, Xincai Zhuang, Yingfeng Lin, Xingsheng Shi, Songjing Zheng, Jianzhang Lin, Wei |
author_sort | Shi, Songchang |
collection | PubMed |
description | Background: Linking genotypic changes to phenotypic traits based on machine learning methods has various challenges. In this study, we developed a workflow based on bioinformatics and machine learning methods using transcriptomic data for sepsis obtained at the first clinical presentation for predicting the risk of sepsis. By combining bioinformatics with machine learning methods, we have attempted to overcome current challenges in predicting disease risk using transcriptomic data. Methods: High-throughput sequencing transcriptomic data processing and gene annotation were performed using R software. Machine learning models were constructed, and model performance was evaluated by machine learning methods in Python. The models were visualized and interpreted using the Shapley Additive explanation (SHAP) method. Results: Based on the preset parameters and using recursive feature elimination implemented via machine learning, the top 10 optimal genes were screened for the establishment of the machine learning models. In a comparison of model performance, CatBoost was selected as the optimal model. We explored the significance of each gene in the model and the interaction between each gene through SHAP analysis. Conclusion: The combination of CatBoost and SHAP may serve as the best-performing machine learning model for predicting transcriptomic and sepsis risks. The workflow outlined may provide a new approach and direction in exploring the mechanisms associated with genes and sepsis risk. |
format | Online Article Text |
id | pubmed-9490444 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-94904442022-09-22 An application based on bioinformatics and machine learning for risk prediction of sepsis at first clinical presentation using transcriptomic data Shi, Songchang Pan, Xiaobin Zhang, Lihui Wang, Xincai Zhuang, Yingfeng Lin, Xingsheng Shi, Songjing Zheng, Jianzhang Lin, Wei Front Genet Genetics Background: Linking genotypic changes to phenotypic traits based on machine learning methods has various challenges. In this study, we developed a workflow based on bioinformatics and machine learning methods using transcriptomic data for sepsis obtained at the first clinical presentation for predicting the risk of sepsis. By combining bioinformatics with machine learning methods, we have attempted to overcome current challenges in predicting disease risk using transcriptomic data. Methods: High-throughput sequencing transcriptomic data processing and gene annotation were performed using R software. Machine learning models were constructed, and model performance was evaluated by machine learning methods in Python. The models were visualized and interpreted using the Shapley Additive explanation (SHAP) method. Results: Based on the preset parameters and using recursive feature elimination implemented via machine learning, the top 10 optimal genes were screened for the establishment of the machine learning models. In a comparison of model performance, CatBoost was selected as the optimal model. We explored the significance of each gene in the model and the interaction between each gene through SHAP analysis. Conclusion: The combination of CatBoost and SHAP may serve as the best-performing machine learning model for predicting transcriptomic and sepsis risks. The workflow outlined may provide a new approach and direction in exploring the mechanisms associated with genes and sepsis risk. Frontiers Media S.A. 2022-09-02 /pmc/articles/PMC9490444/ /pubmed/36159979 http://dx.doi.org/10.3389/fgene.2022.979529 Text en Copyright © 2022 Shi, Pan, Zhang, Wang, Zhuang, Lin, Shi, Zheng and Lin. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Shi, Songchang Pan, Xiaobin Zhang, Lihui Wang, Xincai Zhuang, Yingfeng Lin, Xingsheng Shi, Songjing Zheng, Jianzhang Lin, Wei An application based on bioinformatics and machine learning for risk prediction of sepsis at first clinical presentation using transcriptomic data |
title | An application based on bioinformatics and machine learning for risk prediction of sepsis at first clinical presentation using transcriptomic data |
title_full | An application based on bioinformatics and machine learning for risk prediction of sepsis at first clinical presentation using transcriptomic data |
title_fullStr | An application based on bioinformatics and machine learning for risk prediction of sepsis at first clinical presentation using transcriptomic data |
title_full_unstemmed | An application based on bioinformatics and machine learning for risk prediction of sepsis at first clinical presentation using transcriptomic data |
title_short | An application based on bioinformatics and machine learning for risk prediction of sepsis at first clinical presentation using transcriptomic data |
title_sort | application based on bioinformatics and machine learning for risk prediction of sepsis at first clinical presentation using transcriptomic data |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9490444/ https://www.ncbi.nlm.nih.gov/pubmed/36159979 http://dx.doi.org/10.3389/fgene.2022.979529 |
work_keys_str_mv | AT shisongchang anapplicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata AT panxiaobin anapplicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata AT zhanglihui anapplicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata AT wangxincai anapplicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata AT zhuangyingfeng anapplicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata AT linxingsheng anapplicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata AT shisongjing anapplicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata AT zhengjianzhang anapplicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata AT linwei anapplicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata AT shisongchang applicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata AT panxiaobin applicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata AT zhanglihui applicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata AT wangxincai applicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata AT zhuangyingfeng applicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata AT linxingsheng applicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata AT shisongjing applicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata AT zhengjianzhang applicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata AT linwei applicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata |