Cargando…

An application based on bioinformatics and machine learning for risk prediction of sepsis at first clinical presentation using transcriptomic data

Background: Linking genotypic changes to phenotypic traits based on machine learning methods has various challenges. In this study, we developed a workflow based on bioinformatics and machine learning methods using transcriptomic data for sepsis obtained at the first clinical presentation for predic...

Descripción completa

Detalles Bibliográficos
Autores principales: Shi, Songchang, Pan, Xiaobin, Zhang, Lihui, Wang, Xincai, Zhuang, Yingfeng, Lin, Xingsheng, Shi, Songjing, Zheng, Jianzhang, Lin, Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9490444/
https://www.ncbi.nlm.nih.gov/pubmed/36159979
http://dx.doi.org/10.3389/fgene.2022.979529
_version_ 1784793087455789056
author Shi, Songchang
Pan, Xiaobin
Zhang, Lihui
Wang, Xincai
Zhuang, Yingfeng
Lin, Xingsheng
Shi, Songjing
Zheng, Jianzhang
Lin, Wei
author_facet Shi, Songchang
Pan, Xiaobin
Zhang, Lihui
Wang, Xincai
Zhuang, Yingfeng
Lin, Xingsheng
Shi, Songjing
Zheng, Jianzhang
Lin, Wei
author_sort Shi, Songchang
collection PubMed
description Background: Linking genotypic changes to phenotypic traits based on machine learning methods has various challenges. In this study, we developed a workflow based on bioinformatics and machine learning methods using transcriptomic data for sepsis obtained at the first clinical presentation for predicting the risk of sepsis. By combining bioinformatics with machine learning methods, we have attempted to overcome current challenges in predicting disease risk using transcriptomic data. Methods: High-throughput sequencing transcriptomic data processing and gene annotation were performed using R software. Machine learning models were constructed, and model performance was evaluated by machine learning methods in Python. The models were visualized and interpreted using the Shapley Additive explanation (SHAP) method. Results: Based on the preset parameters and using recursive feature elimination implemented via machine learning, the top 10 optimal genes were screened for the establishment of the machine learning models. In a comparison of model performance, CatBoost was selected as the optimal model. We explored the significance of each gene in the model and the interaction between each gene through SHAP analysis. Conclusion: The combination of CatBoost and SHAP may serve as the best-performing machine learning model for predicting transcriptomic and sepsis risks. The workflow outlined may provide a new approach and direction in exploring the mechanisms associated with genes and sepsis risk.
format Online
Article
Text
id pubmed-9490444
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-94904442022-09-22 An application based on bioinformatics and machine learning for risk prediction of sepsis at first clinical presentation using transcriptomic data Shi, Songchang Pan, Xiaobin Zhang, Lihui Wang, Xincai Zhuang, Yingfeng Lin, Xingsheng Shi, Songjing Zheng, Jianzhang Lin, Wei Front Genet Genetics Background: Linking genotypic changes to phenotypic traits based on machine learning methods has various challenges. In this study, we developed a workflow based on bioinformatics and machine learning methods using transcriptomic data for sepsis obtained at the first clinical presentation for predicting the risk of sepsis. By combining bioinformatics with machine learning methods, we have attempted to overcome current challenges in predicting disease risk using transcriptomic data. Methods: High-throughput sequencing transcriptomic data processing and gene annotation were performed using R software. Machine learning models were constructed, and model performance was evaluated by machine learning methods in Python. The models were visualized and interpreted using the Shapley Additive explanation (SHAP) method. Results: Based on the preset parameters and using recursive feature elimination implemented via machine learning, the top 10 optimal genes were screened for the establishment of the machine learning models. In a comparison of model performance, CatBoost was selected as the optimal model. We explored the significance of each gene in the model and the interaction between each gene through SHAP analysis. Conclusion: The combination of CatBoost and SHAP may serve as the best-performing machine learning model for predicting transcriptomic and sepsis risks. The workflow outlined may provide a new approach and direction in exploring the mechanisms associated with genes and sepsis risk. Frontiers Media S.A. 2022-09-02 /pmc/articles/PMC9490444/ /pubmed/36159979 http://dx.doi.org/10.3389/fgene.2022.979529 Text en Copyright © 2022 Shi, Pan, Zhang, Wang, Zhuang, Lin, Shi, Zheng and Lin. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Shi, Songchang
Pan, Xiaobin
Zhang, Lihui
Wang, Xincai
Zhuang, Yingfeng
Lin, Xingsheng
Shi, Songjing
Zheng, Jianzhang
Lin, Wei
An application based on bioinformatics and machine learning for risk prediction of sepsis at first clinical presentation using transcriptomic data
title An application based on bioinformatics and machine learning for risk prediction of sepsis at first clinical presentation using transcriptomic data
title_full An application based on bioinformatics and machine learning for risk prediction of sepsis at first clinical presentation using transcriptomic data
title_fullStr An application based on bioinformatics and machine learning for risk prediction of sepsis at first clinical presentation using transcriptomic data
title_full_unstemmed An application based on bioinformatics and machine learning for risk prediction of sepsis at first clinical presentation using transcriptomic data
title_short An application based on bioinformatics and machine learning for risk prediction of sepsis at first clinical presentation using transcriptomic data
title_sort application based on bioinformatics and machine learning for risk prediction of sepsis at first clinical presentation using transcriptomic data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9490444/
https://www.ncbi.nlm.nih.gov/pubmed/36159979
http://dx.doi.org/10.3389/fgene.2022.979529
work_keys_str_mv AT shisongchang anapplicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata
AT panxiaobin anapplicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata
AT zhanglihui anapplicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata
AT wangxincai anapplicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata
AT zhuangyingfeng anapplicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata
AT linxingsheng anapplicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata
AT shisongjing anapplicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata
AT zhengjianzhang anapplicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata
AT linwei anapplicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata
AT shisongchang applicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata
AT panxiaobin applicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata
AT zhanglihui applicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata
AT wangxincai applicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata
AT zhuangyingfeng applicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata
AT linxingsheng applicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata
AT shisongjing applicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata
AT zhengjianzhang applicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata
AT linwei applicationbasedonbioinformaticsandmachinelearningforriskpredictionofsepsisatfirstclinicalpresentationusingtranscriptomicdata