Cargando…

A cost-effective machine learning-based method for preeclampsia risk assessment and driver genes discovery

BACKGROUND: The placenta, as a unique exchange organ between mother and fetus, is essential for successful human pregnancy and fetal health. Preeclampsia (PE) caused by placental dysfunction contributes to both maternal and infant morbidity and mortality. Accurate identification of PE patients plays...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Hao, Zhang, Zhaoyue, Li, Haicheng, Li, Jinzhao, Li, Hanshuang, Liu, Mingzhu, Liang, Pengfei, Xi, Qilemuge, Xing, Yongqiang, Yang, Lei, Zuo, Yongchun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972636/
https://www.ncbi.nlm.nih.gov/pubmed/36849879
http://dx.doi.org/10.1186/s13578-023-00991-y
_version_ 1784898363967143936
author Wang, Hao
Zhang, Zhaoyue
Li, Haicheng
Li, Jinzhao
Li, Hanshuang
Liu, Mingzhu
Liang, Pengfei
Xi, Qilemuge
Xing, Yongqiang
Yang, Lei
Zuo, Yongchun
author_facet Wang, Hao
Zhang, Zhaoyue
Li, Haicheng
Li, Jinzhao
Li, Hanshuang
Liu, Mingzhu
Liang, Pengfei
Xi, Qilemuge
Xing, Yongqiang
Yang, Lei
Zuo, Yongchun
author_sort Wang, Hao
collection PubMed
description BACKGROUND: The placenta, as a unique exchange organ between mother and fetus, is essential for successful human pregnancy and fetal health. Preeclampsia (PE) caused by placental dysfunction contributes to both maternal and infant morbidity and mortality. Accurate identification of PE patients plays a vital role in the formulation of treatment plans. However, the traditional clinical methods of PE have a high misdiagnosis rate. RESULTS: Here, we first designed a computational biology method that used single-cell transcriptome (scRNA-seq) of healthy pregnancy (38 wk) and early-onset PE (28–32 wk) to identify pathological cell subpopulations and predict PE risk. Based on machine learning methods and feature selection techniques, we observed that the Tuning ReliefF (TURF) score hybrid with XGBoost (TURF_XGB) achieved optimal performance, with 92.61% accuracy and 92.46% recall for classifying nine cell subpopulations of healthy placentas. Biological landscapes of placenta heterogeneity could be mapped by the 110 marker genes screened by TURF_XGB, which revealed the superiority of the TURF feature mining. Moreover, we processed the PE dataset with LASSO to obtain 497 biomarkers. Integration analysis of the above two gene sets revealed that dendritic cells were closely associated with early-onset PE, and C1QB and C1QC might drive preeclampsia by mediating inflammation. In addition, an ensemble model-based risk stratification card was developed to classify preeclampsia patients, and its area under the receiver operating characteristic curve (AUC) could reach 0.99. For broader accessibility, we designed an accessible online web server (http://bioinfor.imu.edu.cn/placenta). CONCLUSION: Single-cell transcriptome-based preeclampsia risk assessment using an ensemble machine learning framework is a valuable asset for clinical decision-making. C1QB and C1QC may be involved in the development and progression of early-onset PE by affecting the complement and coagulation cascades pathway that mediate inflammation, which has important implications for better understanding the pathogenesis of PE. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13578-023-00991-y.
format Online
Article
Text
id pubmed-9972636
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-99726362023-03-01 A cost-effective machine learning-based method for preeclampsia risk assessment and driver genes discovery Wang, Hao Zhang, Zhaoyue Li, Haicheng Li, Jinzhao Li, Hanshuang Liu, Mingzhu Liang, Pengfei Xi, Qilemuge Xing, Yongqiang Yang, Lei Zuo, Yongchun Cell Biosci Research BACKGROUND: The placenta, as a unique exchange organ between mother and fetus, is essential for successful human pregnancy and fetal health. Preeclampsia (PE) caused by placental dysfunction contributes to both maternal and infant morbidity and mortality. Accurate identification of PE patients plays a vital role in the formulation of treatment plans. However, the traditional clinical methods of PE have a high misdiagnosis rate. RESULTS: Here, we first designed a computational biology method that used single-cell transcriptome (scRNA-seq) of healthy pregnancy (38 wk) and early-onset PE (28–32 wk) to identify pathological cell subpopulations and predict PE risk. Based on machine learning methods and feature selection techniques, we observed that the Tuning ReliefF (TURF) score hybrid with XGBoost (TURF_XGB) achieved optimal performance, with 92.61% accuracy and 92.46% recall for classifying nine cell subpopulations of healthy placentas. Biological landscapes of placenta heterogeneity could be mapped by the 110 marker genes screened by TURF_XGB, which revealed the superiority of the TURF feature mining. Moreover, we processed the PE dataset with LASSO to obtain 497 biomarkers. Integration analysis of the above two gene sets revealed that dendritic cells were closely associated with early-onset PE, and C1QB and C1QC might drive preeclampsia by mediating inflammation. In addition, an ensemble model-based risk stratification card was developed to classify preeclampsia patients, and its area under the receiver operating characteristic curve (AUC) could reach 0.99. For broader accessibility, we designed an accessible online web server (http://bioinfor.imu.edu.cn/placenta). CONCLUSION: Single-cell transcriptome-based preeclampsia risk assessment using an ensemble machine learning framework is a valuable asset for clinical decision-making. C1QB and C1QC may be involved in the development and progression of early-onset PE by affecting the complement and coagulation cascades pathway that mediate inflammation, which has important implications for better understanding the pathogenesis of PE. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13578-023-00991-y. BioMed Central 2023-02-28 /pmc/articles/PMC9972636/ /pubmed/36849879 http://dx.doi.org/10.1186/s13578-023-00991-y Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Wang, Hao
Zhang, Zhaoyue
Li, Haicheng
Li, Jinzhao
Li, Hanshuang
Liu, Mingzhu
Liang, Pengfei
Xi, Qilemuge
Xing, Yongqiang
Yang, Lei
Zuo, Yongchun
A cost-effective machine learning-based method for preeclampsia risk assessment and driver genes discovery
title A cost-effective machine learning-based method for preeclampsia risk assessment and driver genes discovery
title_full A cost-effective machine learning-based method for preeclampsia risk assessment and driver genes discovery
title_fullStr A cost-effective machine learning-based method for preeclampsia risk assessment and driver genes discovery
title_full_unstemmed A cost-effective machine learning-based method for preeclampsia risk assessment and driver genes discovery
title_short A cost-effective machine learning-based method for preeclampsia risk assessment and driver genes discovery
title_sort cost-effective machine learning-based method for preeclampsia risk assessment and driver genes discovery
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972636/
https://www.ncbi.nlm.nih.gov/pubmed/36849879
http://dx.doi.org/10.1186/s13578-023-00991-y
work_keys_str_mv AT wanghao acosteffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT zhangzhaoyue acosteffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT lihaicheng acosteffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT lijinzhao acosteffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT lihanshuang acosteffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT liumingzhu acosteffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT liangpengfei acosteffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT xiqilemuge acosteffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT xingyongqiang acosteffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT yanglei acosteffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT zuoyongchun acosteffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT wanghao costeffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT zhangzhaoyue costeffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT lihaicheng costeffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT lijinzhao costeffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT lihanshuang costeffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT liumingzhu costeffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT liangpengfei costeffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT xiqilemuge costeffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT xingyongqiang costeffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT yanglei costeffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery
AT zuoyongchun costeffectivemachinelearningbasedmethodforpreeclampsiariskassessmentanddrivergenesdiscovery