Cargando…

Efficiency of the Adjusted Binary Classification (ABC) Approach in Osteometric Sex Estimation: A Comparative Study of Different Linear Machine Learning Algorithms and Training Sample Sizes

SIMPLE SUMMARY: This study adopts a dynamic methodology to explore challenges to the practical application of the adjusted binary classification (ABC) approach, which are related to the unmodifiable characteristics of data used in its development, such as intrasexual variation (sexual dimorphism) of...

Descripción completa

Detalles Bibliográficos
Autores principales: Attia, MennattAllah Hassan, Kholief, Marwa A., Zaghloul, Nancy M., Kružić, Ivana, Anđelinović, Šimun, Bašić, Željana, Jerković, Ivan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9220275/
https://www.ncbi.nlm.nih.gov/pubmed/35741437
http://dx.doi.org/10.3390/biology11060917
_version_ 1784732333231833088
author Attia, MennattAllah Hassan
Kholief, Marwa A.
Zaghloul, Nancy M.
Kružić, Ivana
Anđelinović, Šimun
Bašić, Željana
Jerković, Ivan
author_facet Attia, MennattAllah Hassan
Kholief, Marwa A.
Zaghloul, Nancy M.
Kružić, Ivana
Anđelinović, Šimun
Bašić, Željana
Jerković, Ivan
author_sort Attia, MennattAllah Hassan
collection PubMed
description SIMPLE SUMMARY: This study adopts a dynamic methodology to explore challenges to the practical application of the adjusted binary classification (ABC) approach, which are related to the unmodifiable characteristics of data used in its development, such as intrasexual variation (sexual dimorphism) of variables and methodological factors such as the selected classification algorithm and sample size. The adequacy of a training dataset’s size was judged relative to the classification performance in an independent test set. Finding an optimal classifier was also addressed in this study, wherein the results demonstrate that both statistical modeling and machine learning techniques perform almost equally in the univariate models; however, differences are evident in the multivariate model due to the different number of variables included via the feature selection process, as well as the effect of inadequate training sample size relative to the test set. This approach is particularly useful when quick classification/prediction is required for making real-time forensic decisions. ABSTRACT: The adjusted binary classification (ABC) approach was proposed to assure that the binary classification model reaches a particular accuracy level. The present study evaluated the ABC for osteometric sex classification using multiple machine learning (ML) techniques: linear discriminant analysis (LDA), boosted generalized linear model (GLMB), support vector machine (SVM), and logistic regression (LR). We used 13 femoral measurements of 300 individuals from a modern Turkish population sample and split data into two sets: training (n = 240) and testing (n = 60). Then, the five best-performing measurements were selected for training univariate models, while pools of these variables were used for the multivariable models. ML classifier type did not affect the performance of unadjusted models. The accuracy of univariate models was 82–87%, while that of multivariate models was 89–90%. After applying ABC to the crossvalidation set, the accuracy and the positive and negative predictive values for uni- and multivariate models were ≥95%. Sex could be estimated for 28–75% of individuals using univariate models but with an obvious sexing bias, likely caused by different degrees of sexual dimorphism and between-group overlap. However, using multivariate models, we minimized the bias and properly classified 81–87% of individuals. A similar performance was also noted in the testing sample (except for FEB), with accuracies of 96–100%, and a proportion of classified individuals between 30% and 82% in univariate models, and between 90% and 91% in multivariate models. When considering different training sample sizes, we demonstrated that LR was the most sensitive with limited sample sizes (n < 150), while GLMB was the most stable classifier.
format Online
Article
Text
id pubmed-9220275
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-92202752022-06-24 Efficiency of the Adjusted Binary Classification (ABC) Approach in Osteometric Sex Estimation: A Comparative Study of Different Linear Machine Learning Algorithms and Training Sample Sizes Attia, MennattAllah Hassan Kholief, Marwa A. Zaghloul, Nancy M. Kružić, Ivana Anđelinović, Šimun Bašić, Željana Jerković, Ivan Biology (Basel) Article SIMPLE SUMMARY: This study adopts a dynamic methodology to explore challenges to the practical application of the adjusted binary classification (ABC) approach, which are related to the unmodifiable characteristics of data used in its development, such as intrasexual variation (sexual dimorphism) of variables and methodological factors such as the selected classification algorithm and sample size. The adequacy of a training dataset’s size was judged relative to the classification performance in an independent test set. Finding an optimal classifier was also addressed in this study, wherein the results demonstrate that both statistical modeling and machine learning techniques perform almost equally in the univariate models; however, differences are evident in the multivariate model due to the different number of variables included via the feature selection process, as well as the effect of inadequate training sample size relative to the test set. This approach is particularly useful when quick classification/prediction is required for making real-time forensic decisions. ABSTRACT: The adjusted binary classification (ABC) approach was proposed to assure that the binary classification model reaches a particular accuracy level. The present study evaluated the ABC for osteometric sex classification using multiple machine learning (ML) techniques: linear discriminant analysis (LDA), boosted generalized linear model (GLMB), support vector machine (SVM), and logistic regression (LR). We used 13 femoral measurements of 300 individuals from a modern Turkish population sample and split data into two sets: training (n = 240) and testing (n = 60). Then, the five best-performing measurements were selected for training univariate models, while pools of these variables were used for the multivariable models. ML classifier type did not affect the performance of unadjusted models. The accuracy of univariate models was 82–87%, while that of multivariate models was 89–90%. After applying ABC to the crossvalidation set, the accuracy and the positive and negative predictive values for uni- and multivariate models were ≥95%. Sex could be estimated for 28–75% of individuals using univariate models but with an obvious sexing bias, likely caused by different degrees of sexual dimorphism and between-group overlap. However, using multivariate models, we minimized the bias and properly classified 81–87% of individuals. A similar performance was also noted in the testing sample (except for FEB), with accuracies of 96–100%, and a proportion of classified individuals between 30% and 82% in univariate models, and between 90% and 91% in multivariate models. When considering different training sample sizes, we demonstrated that LR was the most sensitive with limited sample sizes (n < 150), while GLMB was the most stable classifier. MDPI 2022-06-15 /pmc/articles/PMC9220275/ /pubmed/35741437 http://dx.doi.org/10.3390/biology11060917 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Attia, MennattAllah Hassan
Kholief, Marwa A.
Zaghloul, Nancy M.
Kružić, Ivana
Anđelinović, Šimun
Bašić, Željana
Jerković, Ivan
Efficiency of the Adjusted Binary Classification (ABC) Approach in Osteometric Sex Estimation: A Comparative Study of Different Linear Machine Learning Algorithms and Training Sample Sizes
title Efficiency of the Adjusted Binary Classification (ABC) Approach in Osteometric Sex Estimation: A Comparative Study of Different Linear Machine Learning Algorithms and Training Sample Sizes
title_full Efficiency of the Adjusted Binary Classification (ABC) Approach in Osteometric Sex Estimation: A Comparative Study of Different Linear Machine Learning Algorithms and Training Sample Sizes
title_fullStr Efficiency of the Adjusted Binary Classification (ABC) Approach in Osteometric Sex Estimation: A Comparative Study of Different Linear Machine Learning Algorithms and Training Sample Sizes
title_full_unstemmed Efficiency of the Adjusted Binary Classification (ABC) Approach in Osteometric Sex Estimation: A Comparative Study of Different Linear Machine Learning Algorithms and Training Sample Sizes
title_short Efficiency of the Adjusted Binary Classification (ABC) Approach in Osteometric Sex Estimation: A Comparative Study of Different Linear Machine Learning Algorithms and Training Sample Sizes
title_sort efficiency of the adjusted binary classification (abc) approach in osteometric sex estimation: a comparative study of different linear machine learning algorithms and training sample sizes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9220275/
https://www.ncbi.nlm.nih.gov/pubmed/35741437
http://dx.doi.org/10.3390/biology11060917
work_keys_str_mv AT attiamennattallahhassan efficiencyoftheadjustedbinaryclassificationabcapproachinosteometricsexestimationacomparativestudyofdifferentlinearmachinelearningalgorithmsandtrainingsamplesizes
AT kholiefmarwaa efficiencyoftheadjustedbinaryclassificationabcapproachinosteometricsexestimationacomparativestudyofdifferentlinearmachinelearningalgorithmsandtrainingsamplesizes
AT zaghloulnancym efficiencyoftheadjustedbinaryclassificationabcapproachinosteometricsexestimationacomparativestudyofdifferentlinearmachinelearningalgorithmsandtrainingsamplesizes
AT kruzicivana efficiencyoftheadjustedbinaryclassificationabcapproachinosteometricsexestimationacomparativestudyofdifferentlinearmachinelearningalgorithmsandtrainingsamplesizes
AT anđelinovicsimun efficiencyoftheadjustedbinaryclassificationabcapproachinosteometricsexestimationacomparativestudyofdifferentlinearmachinelearningalgorithmsandtrainingsamplesizes
AT basiczeljana efficiencyoftheadjustedbinaryclassificationabcapproachinosteometricsexestimationacomparativestudyofdifferentlinearmachinelearningalgorithmsandtrainingsamplesizes
AT jerkovicivan efficiencyoftheadjustedbinaryclassificationabcapproachinosteometricsexestimationacomparativestudyofdifferentlinearmachinelearningalgorithmsandtrainingsamplesizes