Cargando…

Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data

We apply tree-based classification algorithms, namely the classification trees, with the use of the rpart algorithm, random forests and XGBoost methods to detect mood disorder in a group of 2508 lower secondary school students. The dataset presents many challenges, the most important of which is man...

Descripción completa

Detalles Bibliográficos
Autores principales: Turska, Elzbieta, Jurga, Szymon, Piskorski, Jaroslaw
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8468933/
https://www.ncbi.nlm.nih.gov/pubmed/34573835
http://dx.doi.org/10.3390/e23091210
_version_ 1784573799893565440
author Turska, Elzbieta
Jurga, Szymon
Piskorski, Jaroslaw
author_facet Turska, Elzbieta
Jurga, Szymon
Piskorski, Jaroslaw
author_sort Turska, Elzbieta
collection PubMed
description We apply tree-based classification algorithms, namely the classification trees, with the use of the rpart algorithm, random forests and XGBoost methods to detect mood disorder in a group of 2508 lower secondary school students. The dataset presents many challenges, the most important of which is many missing data as well as the being heavily unbalanced (there are few severe mood disorder cases). We find that all algorithms are specific, but only the rpart algorithm is sensitive; i.e., it is able to detect cases of real cases mood disorder. The conclusion of this paper is that this is caused by the fact that the rpart algorithm uses the surrogate variables to handle missing data. The most important social-studies-related result is that the adolescents’ relationships with their parents are the single most important factor in developing mood disorders—far more important than other factors, such as the socio-economic status or school success.
format Online
Article
Text
id pubmed-8468933
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-84689332021-09-27 Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data Turska, Elzbieta Jurga, Szymon Piskorski, Jaroslaw Entropy (Basel) Article We apply tree-based classification algorithms, namely the classification trees, with the use of the rpart algorithm, random forests and XGBoost methods to detect mood disorder in a group of 2508 lower secondary school students. The dataset presents many challenges, the most important of which is many missing data as well as the being heavily unbalanced (there are few severe mood disorder cases). We find that all algorithms are specific, but only the rpart algorithm is sensitive; i.e., it is able to detect cases of real cases mood disorder. The conclusion of this paper is that this is caused by the fact that the rpart algorithm uses the surrogate variables to handle missing data. The most important social-studies-related result is that the adolescents’ relationships with their parents are the single most important factor in developing mood disorders—far more important than other factors, such as the socio-economic status or school success. MDPI 2021-09-14 /pmc/articles/PMC8468933/ /pubmed/34573835 http://dx.doi.org/10.3390/e23091210 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Turska, Elzbieta
Jurga, Szymon
Piskorski, Jaroslaw
Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data
title Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data
title_full Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data
title_fullStr Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data
title_full_unstemmed Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data
title_short Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data
title_sort mood disorder detection in adolescents by classification trees, random forests and xgboost in presence of missing data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8468933/
https://www.ncbi.nlm.nih.gov/pubmed/34573835
http://dx.doi.org/10.3390/e23091210
work_keys_str_mv AT turskaelzbieta mooddisorderdetectioninadolescentsbyclassificationtreesrandomforestsandxgboostinpresenceofmissingdata
AT jurgaszymon mooddisorderdetectioninadolescentsbyclassificationtreesrandomforestsandxgboostinpresenceofmissingdata
AT piskorskijaroslaw mooddisorderdetectioninadolescentsbyclassificationtreesrandomforestsandxgboostinpresenceofmissingdata