Cargando…

Effects of the Training Dataset Characteristics on the Performance of Nine Species Distribution Models: Application to Diabrotica virgifera virgifera

Many distribution models developed to predict the presence/absence of invasive alien species need to be fitted to a training dataset before practical use. The training dataset is characterized by the number of recorded presences/absences and by their geographical locations. The aim of this paper is...

Descripción completa

Detalles Bibliográficos
Autores principales: Dupin, Maxime, Reynaud, Philippe, Jarošík, Vojtěch, Baker, Richard, Brunel, Sarah, Eyre, Dominic, Pergl, Jan, Makowski, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3118793/
https://www.ncbi.nlm.nih.gov/pubmed/21701579
http://dx.doi.org/10.1371/journal.pone.0020957
_version_ 1782206507885002752
author Dupin, Maxime
Reynaud, Philippe
Jarošík, Vojtěch
Baker, Richard
Brunel, Sarah
Eyre, Dominic
Pergl, Jan
Makowski, David
author_facet Dupin, Maxime
Reynaud, Philippe
Jarošík, Vojtěch
Baker, Richard
Brunel, Sarah
Eyre, Dominic
Pergl, Jan
Makowski, David
author_sort Dupin, Maxime
collection PubMed
description Many distribution models developed to predict the presence/absence of invasive alien species need to be fitted to a training dataset before practical use. The training dataset is characterized by the number of recorded presences/absences and by their geographical locations. The aim of this paper is to study the effect of the training dataset characteristics on model performance and to compare the relative importance of three factors influencing model predictive capability; size of training dataset, stage of the biological invasion, and choice of input variables. Nine models were assessed for their ability to predict the distribution of the western corn rootworm, Diabrotica virgifera virgifera, a major pest of corn in North America that has recently invaded Europe. Twenty-six training datasets of various sizes (from 10 to 428 presence records) corresponding to two different stages of invasion (1955 and 1980) and three sets of input bioclimatic variables (19 variables, six variables selected using information on insect biology, and three linear combinations of 19 variables derived from Principal Component Analysis) were considered. The models were fitted to each training dataset in turn and their performance was assessed using independent data from North America and Europe. The models were ranked according to the area under the Receiver Operating Characteristic curve and the likelihood ratio. Model performance was highly sensitive to the geographical area used for calibration; most of the models performed poorly when fitted to a restricted area corresponding to an early stage of the invasion. Our results also showed that Principal Component Analysis was useful in reducing the number of model input variables for the models that performed poorly with 19 input variables. DOMAIN, Environmental Distance, MAXENT, and Envelope Score were the most accurate models but all the models tested in this study led to a substantial rate of mis-classification.
format Online
Article
Text
id pubmed-3118793
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-31187932011-06-23 Effects of the Training Dataset Characteristics on the Performance of Nine Species Distribution Models: Application to Diabrotica virgifera virgifera Dupin, Maxime Reynaud, Philippe Jarošík, Vojtěch Baker, Richard Brunel, Sarah Eyre, Dominic Pergl, Jan Makowski, David PLoS One Research Article Many distribution models developed to predict the presence/absence of invasive alien species need to be fitted to a training dataset before practical use. The training dataset is characterized by the number of recorded presences/absences and by their geographical locations. The aim of this paper is to study the effect of the training dataset characteristics on model performance and to compare the relative importance of three factors influencing model predictive capability; size of training dataset, stage of the biological invasion, and choice of input variables. Nine models were assessed for their ability to predict the distribution of the western corn rootworm, Diabrotica virgifera virgifera, a major pest of corn in North America that has recently invaded Europe. Twenty-six training datasets of various sizes (from 10 to 428 presence records) corresponding to two different stages of invasion (1955 and 1980) and three sets of input bioclimatic variables (19 variables, six variables selected using information on insect biology, and three linear combinations of 19 variables derived from Principal Component Analysis) were considered. The models were fitted to each training dataset in turn and their performance was assessed using independent data from North America and Europe. The models were ranked according to the area under the Receiver Operating Characteristic curve and the likelihood ratio. Model performance was highly sensitive to the geographical area used for calibration; most of the models performed poorly when fitted to a restricted area corresponding to an early stage of the invasion. Our results also showed that Principal Component Analysis was useful in reducing the number of model input variables for the models that performed poorly with 19 input variables. DOMAIN, Environmental Distance, MAXENT, and Envelope Score were the most accurate models but all the models tested in this study led to a substantial rate of mis-classification. Public Library of Science 2011-06-20 /pmc/articles/PMC3118793/ /pubmed/21701579 http://dx.doi.org/10.1371/journal.pone.0020957 Text en Dupin et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Dupin, Maxime
Reynaud, Philippe
Jarošík, Vojtěch
Baker, Richard
Brunel, Sarah
Eyre, Dominic
Pergl, Jan
Makowski, David
Effects of the Training Dataset Characteristics on the Performance of Nine Species Distribution Models: Application to Diabrotica virgifera virgifera
title Effects of the Training Dataset Characteristics on the Performance of Nine Species Distribution Models: Application to Diabrotica virgifera virgifera
title_full Effects of the Training Dataset Characteristics on the Performance of Nine Species Distribution Models: Application to Diabrotica virgifera virgifera
title_fullStr Effects of the Training Dataset Characteristics on the Performance of Nine Species Distribution Models: Application to Diabrotica virgifera virgifera
title_full_unstemmed Effects of the Training Dataset Characteristics on the Performance of Nine Species Distribution Models: Application to Diabrotica virgifera virgifera
title_short Effects of the Training Dataset Characteristics on the Performance of Nine Species Distribution Models: Application to Diabrotica virgifera virgifera
title_sort effects of the training dataset characteristics on the performance of nine species distribution models: application to diabrotica virgifera virgifera
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3118793/
https://www.ncbi.nlm.nih.gov/pubmed/21701579
http://dx.doi.org/10.1371/journal.pone.0020957
work_keys_str_mv AT dupinmaxime effectsofthetrainingdatasetcharacteristicsontheperformanceofninespeciesdistributionmodelsapplicationtodiabroticavirgiferavirgifera
AT reynaudphilippe effectsofthetrainingdatasetcharacteristicsontheperformanceofninespeciesdistributionmodelsapplicationtodiabroticavirgiferavirgifera
AT jarosikvojtech effectsofthetrainingdatasetcharacteristicsontheperformanceofninespeciesdistributionmodelsapplicationtodiabroticavirgiferavirgifera
AT bakerrichard effectsofthetrainingdatasetcharacteristicsontheperformanceofninespeciesdistributionmodelsapplicationtodiabroticavirgiferavirgifera
AT brunelsarah effectsofthetrainingdatasetcharacteristicsontheperformanceofninespeciesdistributionmodelsapplicationtodiabroticavirgiferavirgifera
AT eyredominic effectsofthetrainingdatasetcharacteristicsontheperformanceofninespeciesdistributionmodelsapplicationtodiabroticavirgiferavirgifera
AT pergljan effectsofthetrainingdatasetcharacteristicsontheperformanceofninespeciesdistributionmodelsapplicationtodiabroticavirgiferavirgifera
AT makowskidavid effectsofthetrainingdatasetcharacteristicsontheperformanceofninespeciesdistributionmodelsapplicationtodiabroticavirgiferavirgifera