Cargando…

Text classification to streamline online wildlife trade analyses

Automated monitoring of websites that trade wildlife is increasingly necessary to inform conservation and biosecurity efforts. However, e-commerce and wildlife trading websites can contain a vast number of advertisements, an unknown proportion of which may be irrelevant to researchers and practition...

Descripción completa

Detalles Bibliográficos
Autores principales: Stringham, Oliver C., Moncayo, Stephanie, Hill, Katherine G. W., Toomes, Adam, Mitchell, Lewis, Ross, Joshua V., Cassey, Phillip
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8270201/
https://www.ncbi.nlm.nih.gov/pubmed/34242279
http://dx.doi.org/10.1371/journal.pone.0254007
_version_ 1783720753546395648
author Stringham, Oliver C.
Moncayo, Stephanie
Hill, Katherine G. W.
Toomes, Adam
Mitchell, Lewis
Ross, Joshua V.
Cassey, Phillip
author_facet Stringham, Oliver C.
Moncayo, Stephanie
Hill, Katherine G. W.
Toomes, Adam
Mitchell, Lewis
Ross, Joshua V.
Cassey, Phillip
author_sort Stringham, Oliver C.
collection PubMed
description Automated monitoring of websites that trade wildlife is increasingly necessary to inform conservation and biosecurity efforts. However, e-commerce and wildlife trading websites can contain a vast number of advertisements, an unknown proportion of which may be irrelevant to researchers and practitioners. Given that many wildlife-trade advertisements have an unstructured text format, automated identification of relevant listings has not traditionally been possible, nor attempted. Other scientific disciplines have solved similar problems using machine learning and natural language processing models, such as text classifiers. Here, we test the ability of a suite of text classifiers to extract relevant advertisements from wildlife trade occurring on the Internet. We collected data from an Australian classifieds website where people can post advertisements of their pet birds (n = 16.5k advertisements). We found that text classifiers can predict, with a high degree of accuracy, which listings are relevant (ROC AUC ≥ 0.98, F1 score ≥ 0.77). Furthermore, in an attempt to answer the question ‘how much data is required to have an adequately performing model?’, we conducted a sensitivity analysis by simulating decreases in sample sizes to measure the subsequent change in model performance. From our sensitivity analysis, we found that text classifiers required a minimum sample size of 33% (c. 5.5k listings) to accurately identify relevant listings (for our dataset), providing a reference point for future applications of this sort. Our results suggest that text classification is a viable tool that can be applied to the online trade of wildlife to reduce time dedicated to data cleaning. However, the success of text classifiers will vary depending on the advertisements and websites, and will therefore be context dependent. Further work to integrate other machine learning tools, such as image classification, may provide better predictive abilities in the context of streamlining data processing for wildlife trade related online data.
format Online
Article
Text
id pubmed-8270201
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-82702012021-07-21 Text classification to streamline online wildlife trade analyses Stringham, Oliver C. Moncayo, Stephanie Hill, Katherine G. W. Toomes, Adam Mitchell, Lewis Ross, Joshua V. Cassey, Phillip PLoS One Research Article Automated monitoring of websites that trade wildlife is increasingly necessary to inform conservation and biosecurity efforts. However, e-commerce and wildlife trading websites can contain a vast number of advertisements, an unknown proportion of which may be irrelevant to researchers and practitioners. Given that many wildlife-trade advertisements have an unstructured text format, automated identification of relevant listings has not traditionally been possible, nor attempted. Other scientific disciplines have solved similar problems using machine learning and natural language processing models, such as text classifiers. Here, we test the ability of a suite of text classifiers to extract relevant advertisements from wildlife trade occurring on the Internet. We collected data from an Australian classifieds website where people can post advertisements of their pet birds (n = 16.5k advertisements). We found that text classifiers can predict, with a high degree of accuracy, which listings are relevant (ROC AUC ≥ 0.98, F1 score ≥ 0.77). Furthermore, in an attempt to answer the question ‘how much data is required to have an adequately performing model?’, we conducted a sensitivity analysis by simulating decreases in sample sizes to measure the subsequent change in model performance. From our sensitivity analysis, we found that text classifiers required a minimum sample size of 33% (c. 5.5k listings) to accurately identify relevant listings (for our dataset), providing a reference point for future applications of this sort. Our results suggest that text classification is a viable tool that can be applied to the online trade of wildlife to reduce time dedicated to data cleaning. However, the success of text classifiers will vary depending on the advertisements and websites, and will therefore be context dependent. Further work to integrate other machine learning tools, such as image classification, may provide better predictive abilities in the context of streamlining data processing for wildlife trade related online data. Public Library of Science 2021-07-09 /pmc/articles/PMC8270201/ /pubmed/34242279 http://dx.doi.org/10.1371/journal.pone.0254007 Text en © 2021 Stringham et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Stringham, Oliver C.
Moncayo, Stephanie
Hill, Katherine G. W.
Toomes, Adam
Mitchell, Lewis
Ross, Joshua V.
Cassey, Phillip
Text classification to streamline online wildlife trade analyses
title Text classification to streamline online wildlife trade analyses
title_full Text classification to streamline online wildlife trade analyses
title_fullStr Text classification to streamline online wildlife trade analyses
title_full_unstemmed Text classification to streamline online wildlife trade analyses
title_short Text classification to streamline online wildlife trade analyses
title_sort text classification to streamline online wildlife trade analyses
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8270201/
https://www.ncbi.nlm.nih.gov/pubmed/34242279
http://dx.doi.org/10.1371/journal.pone.0254007
work_keys_str_mv AT stringhamoliverc textclassificationtostreamlineonlinewildlifetradeanalyses
AT moncayostephanie textclassificationtostreamlineonlinewildlifetradeanalyses
AT hillkatherinegw textclassificationtostreamlineonlinewildlifetradeanalyses
AT toomesadam textclassificationtostreamlineonlinewildlifetradeanalyses
AT mitchelllewis textclassificationtostreamlineonlinewildlifetradeanalyses
AT rossjoshuav textclassificationtostreamlineonlinewildlifetradeanalyses
AT casseyphillip textclassificationtostreamlineonlinewildlifetradeanalyses