Cargando…

Use of mixed-type data clustering algorithm for characterizing temporal and spatial distribution of biosecurity border detections of terrestrial non-indigenous species

Appropriate inspection protocols and mitigation strategies are a critical component of effective biosecurity measures, enabling implementation of sound management decisions. Statistical models to analyze biosecurity surveillance data are integral to this decision-making process. Our research focuses...

Descripción completa

Detalles Bibliográficos
Autores principales: Kachigunda, Barbara, Mengersen, Kerrie, Perera, Devindri I., Coupland, Grey T., van der Merwe, Johann, McKirdy, Simon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9362945/
https://www.ncbi.nlm.nih.gov/pubmed/35943971
http://dx.doi.org/10.1371/journal.pone.0272413
_version_ 1784764820766064640
author Kachigunda, Barbara
Mengersen, Kerrie
Perera, Devindri I.
Coupland, Grey T.
van der Merwe, Johann
McKirdy, Simon
author_facet Kachigunda, Barbara
Mengersen, Kerrie
Perera, Devindri I.
Coupland, Grey T.
van der Merwe, Johann
McKirdy, Simon
author_sort Kachigunda, Barbara
collection PubMed
description Appropriate inspection protocols and mitigation strategies are a critical component of effective biosecurity measures, enabling implementation of sound management decisions. Statistical models to analyze biosecurity surveillance data are integral to this decision-making process. Our research focuses on analyzing border interception biosecurity data collected from a Class A Nature Reserve, Barrow Island, in Western Australia and the associated covariates describing both spatial and temporal interception patterns. A clustering analysis approach was adopted using a generalization of the popular k-means algorithm appropriate for mixed-type data. The analysis approach compared the efficiency of clustering using only the numerical data, then subsequently including covariates to the clustering. Based on numerical data only, three clusters gave an acceptable fit and provided information about the underlying data characteristics. Incorporation of covariates into the model suggested four distinct clusters dominated by physical location and type of detection. Clustering increases interpretability of complex models and is useful in data mining to highlight patterns to describe underlying processes in biosecurity and other research areas. Availability of more relevant data would greatly improve the model. Based on outcomes from our research we recommend broader use of cluster models in biosecurity data, with testing of these models on more datasets to validate the model choice and identify important explanatory variables.
format Online
Article
Text
id pubmed-9362945
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-93629452022-08-10 Use of mixed-type data clustering algorithm for characterizing temporal and spatial distribution of biosecurity border detections of terrestrial non-indigenous species Kachigunda, Barbara Mengersen, Kerrie Perera, Devindri I. Coupland, Grey T. van der Merwe, Johann McKirdy, Simon PLoS One Research Article Appropriate inspection protocols and mitigation strategies are a critical component of effective biosecurity measures, enabling implementation of sound management decisions. Statistical models to analyze biosecurity surveillance data are integral to this decision-making process. Our research focuses on analyzing border interception biosecurity data collected from a Class A Nature Reserve, Barrow Island, in Western Australia and the associated covariates describing both spatial and temporal interception patterns. A clustering analysis approach was adopted using a generalization of the popular k-means algorithm appropriate for mixed-type data. The analysis approach compared the efficiency of clustering using only the numerical data, then subsequently including covariates to the clustering. Based on numerical data only, three clusters gave an acceptable fit and provided information about the underlying data characteristics. Incorporation of covariates into the model suggested four distinct clusters dominated by physical location and type of detection. Clustering increases interpretability of complex models and is useful in data mining to highlight patterns to describe underlying processes in biosecurity and other research areas. Availability of more relevant data would greatly improve the model. Based on outcomes from our research we recommend broader use of cluster models in biosecurity data, with testing of these models on more datasets to validate the model choice and identify important explanatory variables. Public Library of Science 2022-08-09 /pmc/articles/PMC9362945/ /pubmed/35943971 http://dx.doi.org/10.1371/journal.pone.0272413 Text en © 2022 Kachigunda et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Kachigunda, Barbara
Mengersen, Kerrie
Perera, Devindri I.
Coupland, Grey T.
van der Merwe, Johann
McKirdy, Simon
Use of mixed-type data clustering algorithm for characterizing temporal and spatial distribution of biosecurity border detections of terrestrial non-indigenous species
title Use of mixed-type data clustering algorithm for characterizing temporal and spatial distribution of biosecurity border detections of terrestrial non-indigenous species
title_full Use of mixed-type data clustering algorithm for characterizing temporal and spatial distribution of biosecurity border detections of terrestrial non-indigenous species
title_fullStr Use of mixed-type data clustering algorithm for characterizing temporal and spatial distribution of biosecurity border detections of terrestrial non-indigenous species
title_full_unstemmed Use of mixed-type data clustering algorithm for characterizing temporal and spatial distribution of biosecurity border detections of terrestrial non-indigenous species
title_short Use of mixed-type data clustering algorithm for characterizing temporal and spatial distribution of biosecurity border detections of terrestrial non-indigenous species
title_sort use of mixed-type data clustering algorithm for characterizing temporal and spatial distribution of biosecurity border detections of terrestrial non-indigenous species
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9362945/
https://www.ncbi.nlm.nih.gov/pubmed/35943971
http://dx.doi.org/10.1371/journal.pone.0272413
work_keys_str_mv AT kachigundabarbara useofmixedtypedataclusteringalgorithmforcharacterizingtemporalandspatialdistributionofbiosecurityborderdetectionsofterrestrialnonindigenousspecies
AT mengersenkerrie useofmixedtypedataclusteringalgorithmforcharacterizingtemporalandspatialdistributionofbiosecurityborderdetectionsofterrestrialnonindigenousspecies
AT pereradevindrii useofmixedtypedataclusteringalgorithmforcharacterizingtemporalandspatialdistributionofbiosecurityborderdetectionsofterrestrialnonindigenousspecies
AT couplandgreyt useofmixedtypedataclusteringalgorithmforcharacterizingtemporalandspatialdistributionofbiosecurityborderdetectionsofterrestrialnonindigenousspecies
AT vandermerwejohann useofmixedtypedataclusteringalgorithmforcharacterizingtemporalandspatialdistributionofbiosecurityborderdetectionsofterrestrialnonindigenousspecies
AT mckirdysimon useofmixedtypedataclusteringalgorithmforcharacterizingtemporalandspatialdistributionofbiosecurityborderdetectionsofterrestrialnonindigenousspecies