Cargando…

Optimization of Cervical Cancer Screening: A Stacking-Integrated Machine Learning Algorithm Based on Demographic, Behavioral, and Clinical Factors

PURPOSE: The purpose is to accurately identify women at high risk of developing cervical cancer so as to optimize cervical screening strategies and make better use of medical resources. However, the predictive models currently in use require clinical physiological and biochemical indicators, resulti...

Descripción completa

Detalles Bibliográficos
Autores principales: Sun, Lin, Yang, Lingping, Liu, Xiyao, Tang, Lan, Zeng, Qi, Gao, Yuwen, Chen, Qian, Liu, Zhaohai, Peng, Bin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8886038/
https://www.ncbi.nlm.nih.gov/pubmed/35242711
http://dx.doi.org/10.3389/fonc.2022.821453
_version_ 1784660562699878400
author Sun, Lin
Yang, Lingping
Liu, Xiyao
Tang, Lan
Zeng, Qi
Gao, Yuwen
Chen, Qian
Liu, Zhaohai
Peng, Bin
author_facet Sun, Lin
Yang, Lingping
Liu, Xiyao
Tang, Lan
Zeng, Qi
Gao, Yuwen
Chen, Qian
Liu, Zhaohai
Peng, Bin
author_sort Sun, Lin
collection PubMed
description PURPOSE: The purpose is to accurately identify women at high risk of developing cervical cancer so as to optimize cervical screening strategies and make better use of medical resources. However, the predictive models currently in use require clinical physiological and biochemical indicators, resulting in a smaller scope of application. Stacking-integrated machine learning (SIML) is an advanced machine learning technique that combined multiple learning algorithms to improve predictive performance. This study aimed to develop a stacking-integrated model that can be used to identify women at high risk of developing cervical cancer based on their demographic, behavioral, and historical clinical factors. METHODS: The data of 858 women screened for cervical cancer at a Venezuelan Hospital were used to develop the SIML algorithm. The screening data were randomly split into training data (80%) that were used to develop the algorithm and testing data (20%) that were used to validate the accuracy of the algorithms. The random forest (RF) model and univariate logistic regression were used to identify predictive features for developing cervical cancer. Twelve well-known ML algorithms were selected, and their performances in predicting cervical cancer were compared. A correlation coefficient matrix was used to cluster the models based on their performance. The SIML was then developed using the best-performing techniques. The sensitivity, specificity, and area under the curve (AUC) of all models were calculated. RESULTS: The RF model identified 18 features predictive of developing cervical cancer. The use of hormonal contraceptives was considered as the most important risk factor, followed by the number of pregnancies, years of smoking, and the number of sexual partners. The SIML algorithm had the best overall performance when compared with other methods and reached an AUC, sensitivity, and specificity of 0.877, 81.8%, and 81.9%, respectively. CONCLUSION: This study shows that SIML can be used to accurately identify women at high risk of developing cervical cancer. This model could be used to personalize the screening program by optimizing the screening interval and care plan in high- and low-risk patients based on their demographics, behavioral patterns, and clinical data.
format Online
Article
Text
id pubmed-8886038
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-88860382022-03-02 Optimization of Cervical Cancer Screening: A Stacking-Integrated Machine Learning Algorithm Based on Demographic, Behavioral, and Clinical Factors Sun, Lin Yang, Lingping Liu, Xiyao Tang, Lan Zeng, Qi Gao, Yuwen Chen, Qian Liu, Zhaohai Peng, Bin Front Oncol Oncology PURPOSE: The purpose is to accurately identify women at high risk of developing cervical cancer so as to optimize cervical screening strategies and make better use of medical resources. However, the predictive models currently in use require clinical physiological and biochemical indicators, resulting in a smaller scope of application. Stacking-integrated machine learning (SIML) is an advanced machine learning technique that combined multiple learning algorithms to improve predictive performance. This study aimed to develop a stacking-integrated model that can be used to identify women at high risk of developing cervical cancer based on their demographic, behavioral, and historical clinical factors. METHODS: The data of 858 women screened for cervical cancer at a Venezuelan Hospital were used to develop the SIML algorithm. The screening data were randomly split into training data (80%) that were used to develop the algorithm and testing data (20%) that were used to validate the accuracy of the algorithms. The random forest (RF) model and univariate logistic regression were used to identify predictive features for developing cervical cancer. Twelve well-known ML algorithms were selected, and their performances in predicting cervical cancer were compared. A correlation coefficient matrix was used to cluster the models based on their performance. The SIML was then developed using the best-performing techniques. The sensitivity, specificity, and area under the curve (AUC) of all models were calculated. RESULTS: The RF model identified 18 features predictive of developing cervical cancer. The use of hormonal contraceptives was considered as the most important risk factor, followed by the number of pregnancies, years of smoking, and the number of sexual partners. The SIML algorithm had the best overall performance when compared with other methods and reached an AUC, sensitivity, and specificity of 0.877, 81.8%, and 81.9%, respectively. CONCLUSION: This study shows that SIML can be used to accurately identify women at high risk of developing cervical cancer. This model could be used to personalize the screening program by optimizing the screening interval and care plan in high- and low-risk patients based on their demographics, behavioral patterns, and clinical data. Frontiers Media S.A. 2022-02-15 /pmc/articles/PMC8886038/ /pubmed/35242711 http://dx.doi.org/10.3389/fonc.2022.821453 Text en Copyright © 2022 Sun, Yang, Liu, Tang, Zeng, Gao, Chen, Liu and Peng https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Oncology
Sun, Lin
Yang, Lingping
Liu, Xiyao
Tang, Lan
Zeng, Qi
Gao, Yuwen
Chen, Qian
Liu, Zhaohai
Peng, Bin
Optimization of Cervical Cancer Screening: A Stacking-Integrated Machine Learning Algorithm Based on Demographic, Behavioral, and Clinical Factors
title Optimization of Cervical Cancer Screening: A Stacking-Integrated Machine Learning Algorithm Based on Demographic, Behavioral, and Clinical Factors
title_full Optimization of Cervical Cancer Screening: A Stacking-Integrated Machine Learning Algorithm Based on Demographic, Behavioral, and Clinical Factors
title_fullStr Optimization of Cervical Cancer Screening: A Stacking-Integrated Machine Learning Algorithm Based on Demographic, Behavioral, and Clinical Factors
title_full_unstemmed Optimization of Cervical Cancer Screening: A Stacking-Integrated Machine Learning Algorithm Based on Demographic, Behavioral, and Clinical Factors
title_short Optimization of Cervical Cancer Screening: A Stacking-Integrated Machine Learning Algorithm Based on Demographic, Behavioral, and Clinical Factors
title_sort optimization of cervical cancer screening: a stacking-integrated machine learning algorithm based on demographic, behavioral, and clinical factors
topic Oncology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8886038/
https://www.ncbi.nlm.nih.gov/pubmed/35242711
http://dx.doi.org/10.3389/fonc.2022.821453
work_keys_str_mv AT sunlin optimizationofcervicalcancerscreeningastackingintegratedmachinelearningalgorithmbasedondemographicbehavioralandclinicalfactors
AT yanglingping optimizationofcervicalcancerscreeningastackingintegratedmachinelearningalgorithmbasedondemographicbehavioralandclinicalfactors
AT liuxiyao optimizationofcervicalcancerscreeningastackingintegratedmachinelearningalgorithmbasedondemographicbehavioralandclinicalfactors
AT tanglan optimizationofcervicalcancerscreeningastackingintegratedmachinelearningalgorithmbasedondemographicbehavioralandclinicalfactors
AT zengqi optimizationofcervicalcancerscreeningastackingintegratedmachinelearningalgorithmbasedondemographicbehavioralandclinicalfactors
AT gaoyuwen optimizationofcervicalcancerscreeningastackingintegratedmachinelearningalgorithmbasedondemographicbehavioralandclinicalfactors
AT chenqian optimizationofcervicalcancerscreeningastackingintegratedmachinelearningalgorithmbasedondemographicbehavioralandclinicalfactors
AT liuzhaohai optimizationofcervicalcancerscreeningastackingintegratedmachinelearningalgorithmbasedondemographicbehavioralandclinicalfactors
AT pengbin optimizationofcervicalcancerscreeningastackingintegratedmachinelearningalgorithmbasedondemographicbehavioralandclinicalfactors