Cargando…

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

BACKGROUND: To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a...

Descripción completa

Detalles Bibliográficos
Autores principales: Chicco, Davide, Jurman, Giuseppe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6941312/
https://www.ncbi.nlm.nih.gov/pubmed/31898477
http://dx.doi.org/10.1186/s12864-019-6413-7
_version_ 1783484529039638528
author Chicco, Davide
Jurman, Giuseppe
author_facet Chicco, Davide
Jurman, Giuseppe
author_sort Chicco, Davide
collection PubMed
description BACKGROUND: To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F(1) score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification tasks. However, these statistical measures can dangerously show overoptimistic inflated results, especially on imbalanced datasets. RESULTS: The Matthews correlation coefficient (MCC), instead, is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positives, false negatives, true negatives, and false positives), proportionally both to the size of positive elements and the size of negative elements in the dataset. CONCLUSIONS: In this article, we show how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F(1) score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario. We believe that the Matthews correlation coefficient should be preferred to accuracy and F(1) score in evaluating binary classification tasks by all scientific communities.
format Online
Article
Text
id pubmed-6941312
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69413122020-01-06 The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation Chicco, Davide Jurman, Giuseppe BMC Genomics Research Article BACKGROUND: To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F(1) score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification tasks. However, these statistical measures can dangerously show overoptimistic inflated results, especially on imbalanced datasets. RESULTS: The Matthews correlation coefficient (MCC), instead, is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positives, false negatives, true negatives, and false positives), proportionally both to the size of positive elements and the size of negative elements in the dataset. CONCLUSIONS: In this article, we show how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F(1) score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario. We believe that the Matthews correlation coefficient should be preferred to accuracy and F(1) score in evaluating binary classification tasks by all scientific communities. BioMed Central 2020-01-02 /pmc/articles/PMC6941312/ /pubmed/31898477 http://dx.doi.org/10.1186/s12864-019-6413-7 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Chicco, Davide
Jurman, Giuseppe
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation
title The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation
title_full The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation
title_fullStr The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation
title_full_unstemmed The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation
title_short The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation
title_sort advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6941312/
https://www.ncbi.nlm.nih.gov/pubmed/31898477
http://dx.doi.org/10.1186/s12864-019-6413-7
work_keys_str_mv AT chiccodavide theadvantagesofthematthewscorrelationcoefficientmccoverf1scoreandaccuracyinbinaryclassificationevaluation
AT jurmangiuseppe theadvantagesofthematthewscorrelationcoefficientmccoverf1scoreandaccuracyinbinaryclassificationevaluation
AT chiccodavide advantagesofthematthewscorrelationcoefficientmccoverf1scoreandaccuracyinbinaryclassificationevaluation
AT jurmangiuseppe advantagesofthematthewscorrelationcoefficientmccoverf1scoreandaccuracyinbinaryclassificationevaluation