Cargando…

Plotting receiver operating characteristic and precision–recall curves from presence and background data

1. The receiver operating characteristic (ROC) and precision–recall (PR) plots have been widely used to evaluate the performance of species distribution models. Plotting the ROC/PR curves requires a traditional test set with both presence and absence data (namely PA approach), but species absence da...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Wenkai, Guo, Qinghua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8328458/
https://www.ncbi.nlm.nih.gov/pubmed/34367569
http://dx.doi.org/10.1002/ece3.7826
_version_ 1783732321344552960
author Li, Wenkai
Guo, Qinghua
author_facet Li, Wenkai
Guo, Qinghua
author_sort Li, Wenkai
collection PubMed
description 1. The receiver operating characteristic (ROC) and precision–recall (PR) plots have been widely used to evaluate the performance of species distribution models. Plotting the ROC/PR curves requires a traditional test set with both presence and absence data (namely PA approach), but species absence data are usually not available in reality. Plotting the ROC/PR curves from presence‐only data while treating background data as pseudo absence data (namely PO approach) may provide misleading results. 2. In this study, we propose a new approach to calibrate the ROC/PR curves from presence and background data with user‐provided information on a constant c, namely PB approach. Here, c defines the probability that species occurrence is detected (labeled), and an estimate of c can also be derived from the PB‐based ROC/PR plots given that a model with good ability of discrimination is available. We used five virtual species and a real aerial photography to test the effectiveness of the proposed PB‐based ROC/PR plots. Different models (or classifiers) were trained from presence and background data with various sample sizes. The ROC/PR curves plotted by PA approach were used to benchmark the curves plotted by PO and PB approaches. 3. Experimental results show that the curves and areas under curves by PB approach are more similar to that by PA approach as compared with PO approach. The PB‐based ROC/PR plots also provide highly accurate estimations of c in our experiment. 4. We conclude that the proposed PB‐based ROC/PR plots can provide valuable complements to the existing model assessment methods, and they also provide an additional way to estimate the constant c (or species prevalence) from presence and background data.
format Online
Article
Text
id pubmed-8328458
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-83284582021-08-06 Plotting receiver operating characteristic and precision–recall curves from presence and background data Li, Wenkai Guo, Qinghua Ecol Evol Original Research 1. The receiver operating characteristic (ROC) and precision–recall (PR) plots have been widely used to evaluate the performance of species distribution models. Plotting the ROC/PR curves requires a traditional test set with both presence and absence data (namely PA approach), but species absence data are usually not available in reality. Plotting the ROC/PR curves from presence‐only data while treating background data as pseudo absence data (namely PO approach) may provide misleading results. 2. In this study, we propose a new approach to calibrate the ROC/PR curves from presence and background data with user‐provided information on a constant c, namely PB approach. Here, c defines the probability that species occurrence is detected (labeled), and an estimate of c can also be derived from the PB‐based ROC/PR plots given that a model with good ability of discrimination is available. We used five virtual species and a real aerial photography to test the effectiveness of the proposed PB‐based ROC/PR plots. Different models (or classifiers) were trained from presence and background data with various sample sizes. The ROC/PR curves plotted by PA approach were used to benchmark the curves plotted by PO and PB approaches. 3. Experimental results show that the curves and areas under curves by PB approach are more similar to that by PA approach as compared with PO approach. The PB‐based ROC/PR plots also provide highly accurate estimations of c in our experiment. 4. We conclude that the proposed PB‐based ROC/PR plots can provide valuable complements to the existing model assessment methods, and they also provide an additional way to estimate the constant c (or species prevalence) from presence and background data. John Wiley and Sons Inc. 2021-07-01 /pmc/articles/PMC8328458/ /pubmed/34367569 http://dx.doi.org/10.1002/ece3.7826 Text en © 2021 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Research
Li, Wenkai
Guo, Qinghua
Plotting receiver operating characteristic and precision–recall curves from presence and background data
title Plotting receiver operating characteristic and precision–recall curves from presence and background data
title_full Plotting receiver operating characteristic and precision–recall curves from presence and background data
title_fullStr Plotting receiver operating characteristic and precision–recall curves from presence and background data
title_full_unstemmed Plotting receiver operating characteristic and precision–recall curves from presence and background data
title_short Plotting receiver operating characteristic and precision–recall curves from presence and background data
title_sort plotting receiver operating characteristic and precision–recall curves from presence and background data
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8328458/
https://www.ncbi.nlm.nih.gov/pubmed/34367569
http://dx.doi.org/10.1002/ece3.7826
work_keys_str_mv AT liwenkai plottingreceiveroperatingcharacteristicandprecisionrecallcurvesfrompresenceandbackgrounddata
AT guoqinghua plottingreceiveroperatingcharacteristicandprecisionrecallcurvesfrompresenceandbackgrounddata