Cargando…

Analysis of lung cancer risk factors from medical records in Ethiopia using machine learning

Cancer is a broad term that refers to a wide range of diseases that can affect any part of the human body. To minimize the number of cancer deaths and to prepare an appropriate health policy on cancer spread mitigation, scientifically supported knowledge of cancer causes is critical. As a result, in...

Descripción completa

Detalles Bibliográficos
Autores principales: Endalie, Demeke, Abebe, Wondmagegn Taye
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10355424/
https://www.ncbi.nlm.nih.gov/pubmed/37467222
http://dx.doi.org/10.1371/journal.pdig.0000308
_version_ 1785075138153152512
author Endalie, Demeke
Abebe, Wondmagegn Taye
author_facet Endalie, Demeke
Abebe, Wondmagegn Taye
author_sort Endalie, Demeke
collection PubMed
description Cancer is a broad term that refers to a wide range of diseases that can affect any part of the human body. To minimize the number of cancer deaths and to prepare an appropriate health policy on cancer spread mitigation, scientifically supported knowledge of cancer causes is critical. As a result, in this study, we analyzed lung cancer risk factors that lead to a highly severe cancer case using a decision tree-based ranking algorithm. This feature relevance ranking algorithm computes the weight of each feature of the dataset by using split points to improve detection accuracy, and each risk factor is weighted based on the number of observations that occur for it on the decision tree. Coughing of blood, air pollution, and obesity are the most severe lung cancer risk factors out of nine, with a weight of 39%, 21%, and 14%, respectively. We also proposed a machine learning model that uses Extreme Gradient Boosting (XGBoost) to detect lung cancer severity levels in lung cancer patients. We used a dataset of 1000 lung cancer patients and 465 individuals free from lung cancer from Tikur Ambesa (Black Lion) Hospital in Addis Ababa, Ethiopia, to assess the performance of the proposed model. The proposed cancer severity level detection model achieved 98.9%, 99%, and 98.9% accuracy, precision, and recall, respectively, for the testing dataset. The findings can assist governments and non-governmental organizations in making lung cancer-related policy decisions.
format Online
Article
Text
id pubmed-10355424
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-103554242023-07-20 Analysis of lung cancer risk factors from medical records in Ethiopia using machine learning Endalie, Demeke Abebe, Wondmagegn Taye PLOS Digit Health Research Article Cancer is a broad term that refers to a wide range of diseases that can affect any part of the human body. To minimize the number of cancer deaths and to prepare an appropriate health policy on cancer spread mitigation, scientifically supported knowledge of cancer causes is critical. As a result, in this study, we analyzed lung cancer risk factors that lead to a highly severe cancer case using a decision tree-based ranking algorithm. This feature relevance ranking algorithm computes the weight of each feature of the dataset by using split points to improve detection accuracy, and each risk factor is weighted based on the number of observations that occur for it on the decision tree. Coughing of blood, air pollution, and obesity are the most severe lung cancer risk factors out of nine, with a weight of 39%, 21%, and 14%, respectively. We also proposed a machine learning model that uses Extreme Gradient Boosting (XGBoost) to detect lung cancer severity levels in lung cancer patients. We used a dataset of 1000 lung cancer patients and 465 individuals free from lung cancer from Tikur Ambesa (Black Lion) Hospital in Addis Ababa, Ethiopia, to assess the performance of the proposed model. The proposed cancer severity level detection model achieved 98.9%, 99%, and 98.9% accuracy, precision, and recall, respectively, for the testing dataset. The findings can assist governments and non-governmental organizations in making lung cancer-related policy decisions. Public Library of Science 2023-07-19 /pmc/articles/PMC10355424/ /pubmed/37467222 http://dx.doi.org/10.1371/journal.pdig.0000308 Text en © 2023 Endalie, Abebe https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Endalie, Demeke
Abebe, Wondmagegn Taye
Analysis of lung cancer risk factors from medical records in Ethiopia using machine learning
title Analysis of lung cancer risk factors from medical records in Ethiopia using machine learning
title_full Analysis of lung cancer risk factors from medical records in Ethiopia using machine learning
title_fullStr Analysis of lung cancer risk factors from medical records in Ethiopia using machine learning
title_full_unstemmed Analysis of lung cancer risk factors from medical records in Ethiopia using machine learning
title_short Analysis of lung cancer risk factors from medical records in Ethiopia using machine learning
title_sort analysis of lung cancer risk factors from medical records in ethiopia using machine learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10355424/
https://www.ncbi.nlm.nih.gov/pubmed/37467222
http://dx.doi.org/10.1371/journal.pdig.0000308
work_keys_str_mv AT endaliedemeke analysisoflungcancerriskfactorsfrommedicalrecordsinethiopiausingmachinelearning
AT abebewondmagegntaye analysisoflungcancerriskfactorsfrommedicalrecordsinethiopiausingmachinelearning