Cargando…

Revisiting the Risk Factors for Endometriosis: A Machine Learning Approach

Endometriosis is a condition characterized by implants of endometrial tissues into extrauterine sites, mostly within the pelvic peritoneum. The prevalence of endometriosis is under-diagnosed and is estimated to account for 5–10% of all women of reproductive age. The goal of this study was to develop...

Descripción completa

Detalles Bibliográficos
Autores principales: Blass, Ido, Sahar, Tali, Shraibman, Adi, Ofer, Dan, Rappoport, Nadav, Linial, Michal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9317820/
https://www.ncbi.nlm.nih.gov/pubmed/35887611
http://dx.doi.org/10.3390/jpm12071114
_version_ 1784755149885931520
author Blass, Ido
Sahar, Tali
Shraibman, Adi
Ofer, Dan
Rappoport, Nadav
Linial, Michal
author_facet Blass, Ido
Sahar, Tali
Shraibman, Adi
Ofer, Dan
Rappoport, Nadav
Linial, Michal
author_sort Blass, Ido
collection PubMed
description Endometriosis is a condition characterized by implants of endometrial tissues into extrauterine sites, mostly within the pelvic peritoneum. The prevalence of endometriosis is under-diagnosed and is estimated to account for 5–10% of all women of reproductive age. The goal of this study was to develop a model for endometriosis based on the UK-biobank (UKB) and re-assess the contribution of known risk factors to endometriosis. We partitioned the data into those diagnosed with endometriosis (5924; ICD-10: N80) and a control group (142,723). We included over 1000 variables from the UKB covering personal information about female health, lifestyle, self-reported data, genetic variants, and medical history prior to endometriosis diagnosis. We applied machine learning algorithms to train an endometriosis prediction model. The optimal prediction was achieved with the gradient boosting algorithms of CatBoost for the data-combined model with an area under the ROC curve (ROC-AUC) of 0.81. The same results were obtained for women from a mixed ethnicity population of the UKB (7112; ICD-10: N80). We discovered that, prior to being diagnosed with endometriosis, affected women had significantly more ICD-10 diagnoses than the average unaffected woman. We used SHAP, an explainable AI tool, to estimate the marginal impact of a feature, given all other features. The informative features ranked by SHAP values included irritable bowel syndrome (IBS) and the length of the menstrual cycle. We conclude that the rich population-based retrospective data from the UKB are valuable for developing unified machine learning endometriosis models despite the limitations of missing data, noisy medical input, and participant age. The informative features of the model may improve clinical utility for endometriosis diagnosis.
format Online
Article
Text
id pubmed-9317820
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-93178202022-07-27 Revisiting the Risk Factors for Endometriosis: A Machine Learning Approach Blass, Ido Sahar, Tali Shraibman, Adi Ofer, Dan Rappoport, Nadav Linial, Michal J Pers Med Article Endometriosis is a condition characterized by implants of endometrial tissues into extrauterine sites, mostly within the pelvic peritoneum. The prevalence of endometriosis is under-diagnosed and is estimated to account for 5–10% of all women of reproductive age. The goal of this study was to develop a model for endometriosis based on the UK-biobank (UKB) and re-assess the contribution of known risk factors to endometriosis. We partitioned the data into those diagnosed with endometriosis (5924; ICD-10: N80) and a control group (142,723). We included over 1000 variables from the UKB covering personal information about female health, lifestyle, self-reported data, genetic variants, and medical history prior to endometriosis diagnosis. We applied machine learning algorithms to train an endometriosis prediction model. The optimal prediction was achieved with the gradient boosting algorithms of CatBoost for the data-combined model with an area under the ROC curve (ROC-AUC) of 0.81. The same results were obtained for women from a mixed ethnicity population of the UKB (7112; ICD-10: N80). We discovered that, prior to being diagnosed with endometriosis, affected women had significantly more ICD-10 diagnoses than the average unaffected woman. We used SHAP, an explainable AI tool, to estimate the marginal impact of a feature, given all other features. The informative features ranked by SHAP values included irritable bowel syndrome (IBS) and the length of the menstrual cycle. We conclude that the rich population-based retrospective data from the UKB are valuable for developing unified machine learning endometriosis models despite the limitations of missing data, noisy medical input, and participant age. The informative features of the model may improve clinical utility for endometriosis diagnosis. MDPI 2022-07-07 /pmc/articles/PMC9317820/ /pubmed/35887611 http://dx.doi.org/10.3390/jpm12071114 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Blass, Ido
Sahar, Tali
Shraibman, Adi
Ofer, Dan
Rappoport, Nadav
Linial, Michal
Revisiting the Risk Factors for Endometriosis: A Machine Learning Approach
title Revisiting the Risk Factors for Endometriosis: A Machine Learning Approach
title_full Revisiting the Risk Factors for Endometriosis: A Machine Learning Approach
title_fullStr Revisiting the Risk Factors for Endometriosis: A Machine Learning Approach
title_full_unstemmed Revisiting the Risk Factors for Endometriosis: A Machine Learning Approach
title_short Revisiting the Risk Factors for Endometriosis: A Machine Learning Approach
title_sort revisiting the risk factors for endometriosis: a machine learning approach
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9317820/
https://www.ncbi.nlm.nih.gov/pubmed/35887611
http://dx.doi.org/10.3390/jpm12071114
work_keys_str_mv AT blassido revisitingtheriskfactorsforendometriosisamachinelearningapproach
AT sahartali revisitingtheriskfactorsforendometriosisamachinelearningapproach
AT shraibmanadi revisitingtheriskfactorsforendometriosisamachinelearningapproach
AT oferdan revisitingtheriskfactorsforendometriosisamachinelearningapproach
AT rappoportnadav revisitingtheriskfactorsforendometriosisamachinelearningapproach
AT linialmichal revisitingtheriskfactorsforendometriosisamachinelearningapproach