Cargando…
On the information hidden in a classifier distribution
Classification tasks are a common challenge to every field of science. To correctly interpret the results provided by a classifier, we need to know the performance indices of the classifier including its sensitivity, specificity, the most appropriate cut-off value (for continuous classifiers), etc....
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7807039/ https://www.ncbi.nlm.nih.gov/pubmed/33441644 http://dx.doi.org/10.1038/s41598-020-79548-9 |
_version_ | 1783636659579912192 |
---|---|
author | Habibzadeh, Farrokh Habibzadeh, Parham Yadollahie, Mahboobeh Roozbehi, Hooman |
author_facet | Habibzadeh, Farrokh Habibzadeh, Parham Yadollahie, Mahboobeh Roozbehi, Hooman |
author_sort | Habibzadeh, Farrokh |
collection | PubMed |
description | Classification tasks are a common challenge to every field of science. To correctly interpret the results provided by a classifier, we need to know the performance indices of the classifier including its sensitivity, specificity, the most appropriate cut-off value (for continuous classifiers), etc. Typically, several studies should be conducted to find all these indices. Herein, we show that they already exist, hidden in the distribution of the variable used to classify, and can readily be harvested. An educated guess about the distribution of the variable used to classify in each class would help us to decompose the frequency distribution of the variable in population into its components—the probability density function of the variable in each class. Based on the harvested parameters, we can then calculate the performance indices of the classifier. As a case study, we applied the technique to the relative frequency distribution of prostate-specific antigen, a biomarker commonly used in medicine for the diagnosis of prostate cancer. We used nonlinear curve fitting to decompose the variable relative frequency distribution into the probability density functions of the non-diseased and diseased people. The functions were then used to determine the performance indices of the classifier. Sensitivity, specificity, the most appropriate cut-off value, and likelihood ratios were calculated. The reference range of the biomarker and the prevalence of prostate cancer for various age groups were also calculated. The indices obtained were in good agreement with the values reported in previous studies. All these were done without being aware of the real health status of the individuals studied. The method is even applicable for conditions with no definite definitions (e.g., hypertension). We believe the method has a wide range of applications in many scientific fields. |
format | Online Article Text |
id | pubmed-7807039 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-78070392021-01-14 On the information hidden in a classifier distribution Habibzadeh, Farrokh Habibzadeh, Parham Yadollahie, Mahboobeh Roozbehi, Hooman Sci Rep Article Classification tasks are a common challenge to every field of science. To correctly interpret the results provided by a classifier, we need to know the performance indices of the classifier including its sensitivity, specificity, the most appropriate cut-off value (for continuous classifiers), etc. Typically, several studies should be conducted to find all these indices. Herein, we show that they already exist, hidden in the distribution of the variable used to classify, and can readily be harvested. An educated guess about the distribution of the variable used to classify in each class would help us to decompose the frequency distribution of the variable in population into its components—the probability density function of the variable in each class. Based on the harvested parameters, we can then calculate the performance indices of the classifier. As a case study, we applied the technique to the relative frequency distribution of prostate-specific antigen, a biomarker commonly used in medicine for the diagnosis of prostate cancer. We used nonlinear curve fitting to decompose the variable relative frequency distribution into the probability density functions of the non-diseased and diseased people. The functions were then used to determine the performance indices of the classifier. Sensitivity, specificity, the most appropriate cut-off value, and likelihood ratios were calculated. The reference range of the biomarker and the prevalence of prostate cancer for various age groups were also calculated. The indices obtained were in good agreement with the values reported in previous studies. All these were done without being aware of the real health status of the individuals studied. The method is even applicable for conditions with no definite definitions (e.g., hypertension). We believe the method has a wide range of applications in many scientific fields. Nature Publishing Group UK 2021-01-13 /pmc/articles/PMC7807039/ /pubmed/33441644 http://dx.doi.org/10.1038/s41598-020-79548-9 Text en © The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Habibzadeh, Farrokh Habibzadeh, Parham Yadollahie, Mahboobeh Roozbehi, Hooman On the information hidden in a classifier distribution |
title | On the information hidden in a classifier distribution |
title_full | On the information hidden in a classifier distribution |
title_fullStr | On the information hidden in a classifier distribution |
title_full_unstemmed | On the information hidden in a classifier distribution |
title_short | On the information hidden in a classifier distribution |
title_sort | on the information hidden in a classifier distribution |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7807039/ https://www.ncbi.nlm.nih.gov/pubmed/33441644 http://dx.doi.org/10.1038/s41598-020-79548-9 |
work_keys_str_mv | AT habibzadehfarrokh ontheinformationhiddeninaclassifierdistribution AT habibzadehparham ontheinformationhiddeninaclassifierdistribution AT yadollahiemahboobeh ontheinformationhiddeninaclassifierdistribution AT roozbehihooman ontheinformationhiddeninaclassifierdistribution |