Cargando…

Improving Diagnosis of Depression With XGBOOST Machine Learning Model and a Large Biomarkers Dutch Dataset (n = 11,081)

Machine Learning has been on the rise and healthcare is no exception to that. In healthcare, mental health is gaining more and more space. The diagnosis of mental disorders is based upon standardized patient interviews with defined set of questions and scales which is a time consuming and costly pro...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sharma, Amita, Verbeke, Willem J. M. I.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2020
Materias:	Big Data
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931945/ https://www.ncbi.nlm.nih.gov/pubmed/33693389 http://dx.doi.org/10.3389/fdata.2020.00015

_version_	1783660388799217664
author	Sharma, Amita Verbeke, Willem J. M. I.
author_facet	Sharma, Amita Verbeke, Willem J. M. I.
author_sort	Sharma, Amita
collection	PubMed
description	Machine Learning has been on the rise and healthcare is no exception to that. In healthcare, mental health is gaining more and more space. The diagnosis of mental disorders is based upon standardized patient interviews with defined set of questions and scales which is a time consuming and costly process. Our objective was to apply the machine learning model and to evaluate to see if there is predictive power of biomarkers data to enhance the diagnosis of depression cases. In this research paper, we aimed to explore the detection of depression cases among the sample of 11,081 Dutch citizen dataset. Most of the earlier studies have balanced datasets wherein the proportion of healthy cases and unhealthy cases are equal but in our study, the dataset contains only 570 cases of self-reported depression out of 11,081 cases hence it is a class imbalance classification problem. The machine learning model built on imbalance dataset gives predictions biased toward majority class hence the model will always predict the case as no depression case even if it is a case of depression. We used different resampling strategies to address the class imbalance problem. We created multiple samples by under sampling, over sampling, over-under sampling and ROSE sampling techniques to balance the dataset and then, we applied machine learning algorithm “Extreme Gradient Boosting” (XGBoost) on each sample to classify the mental illness cases from healthy cases. The balanced accuracy, precision, recall and F1 score obtained from over-sampling and over-under sampling were more than 0.90.
format	Online Article Text
id	pubmed-7931945
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-79319452021-03-09 Improving Diagnosis of Depression With XGBOOST Machine Learning Model and a Large Biomarkers Dutch Dataset (n = 11,081) Sharma, Amita Verbeke, Willem J. M. I. Front Big Data Big Data Machine Learning has been on the rise and healthcare is no exception to that. In healthcare, mental health is gaining more and more space. The diagnosis of mental disorders is based upon standardized patient interviews with defined set of questions and scales which is a time consuming and costly process. Our objective was to apply the machine learning model and to evaluate to see if there is predictive power of biomarkers data to enhance the diagnosis of depression cases. In this research paper, we aimed to explore the detection of depression cases among the sample of 11,081 Dutch citizen dataset. Most of the earlier studies have balanced datasets wherein the proportion of healthy cases and unhealthy cases are equal but in our study, the dataset contains only 570 cases of self-reported depression out of 11,081 cases hence it is a class imbalance classification problem. The machine learning model built on imbalance dataset gives predictions biased toward majority class hence the model will always predict the case as no depression case even if it is a case of depression. We used different resampling strategies to address the class imbalance problem. We created multiple samples by under sampling, over sampling, over-under sampling and ROSE sampling techniques to balance the dataset and then, we applied machine learning algorithm “Extreme Gradient Boosting” (XGBoost) on each sample to classify the mental illness cases from healthy cases. The balanced accuracy, precision, recall and F1 score obtained from over-sampling and over-under sampling were more than 0.90. Frontiers Media S.A. 2020-04-30 /pmc/articles/PMC7931945/ /pubmed/33693389 http://dx.doi.org/10.3389/fdata.2020.00015 Text en Copyright © 2020 Sharma and Verbeke. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Big Data Sharma, Amita Verbeke, Willem J. M. I. Improving Diagnosis of Depression With XGBOOST Machine Learning Model and a Large Biomarkers Dutch Dataset (n = 11,081)
title	Improving Diagnosis of Depression With XGBOOST Machine Learning Model and a Large Biomarkers Dutch Dataset (n = 11,081)
title_full	Improving Diagnosis of Depression With XGBOOST Machine Learning Model and a Large Biomarkers Dutch Dataset (n = 11,081)
title_fullStr	Improving Diagnosis of Depression With XGBOOST Machine Learning Model and a Large Biomarkers Dutch Dataset (n = 11,081)
title_full_unstemmed	Improving Diagnosis of Depression With XGBOOST Machine Learning Model and a Large Biomarkers Dutch Dataset (n = 11,081)
title_short	Improving Diagnosis of Depression With XGBOOST Machine Learning Model and a Large Biomarkers Dutch Dataset (n = 11,081)
title_sort	improving diagnosis of depression with xgboost machine learning model and a large biomarkers dutch dataset (n = 11,081)
topic	Big Data
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931945/ https://www.ncbi.nlm.nih.gov/pubmed/33693389 http://dx.doi.org/10.3389/fdata.2020.00015
work_keys_str_mv	AT sharmaamita improvingdiagnosisofdepressionwithxgboostmachinelearningmodelandalargebiomarkersdutchdatasetn11081 AT verbekewillemjmi improvingdiagnosisofdepressionwithxgboostmachinelearningmodelandalargebiomarkersdutchdatasetn11081

Improving Diagnosis of Depression With XGBOOST Machine Learning Model and a Large Biomarkers Dutch Dataset (n = 11,081)

Ejemplares similares