Cargando…

Accurate image-based identification of macroinvertebrate specimens using deep learning—How much training data is needed?

Image-based methods for species identification offer cost-efficient solutions for biomonitoring. This is particularly relevant for invertebrate studies, where bulk samples often represent insurmountable workloads for sorting, identifying, and counting individual specimens. On the other hand, image-b...

Descripción completa

Detalles Bibliográficos
Autores principales:	Høye, Toke T., Dyrmann, Mads, Kjær, Christian, Nielsen, Johnny, Bruus, Marianne, Mielec, Cecilie L., Vesterdal, Maria S., Bjerge, Kim, Madsen, Sigurd A., Jeppesen, Mads R., Melvad, Claus
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2022
Materias:	Biodiversity
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9415355/ https://www.ncbi.nlm.nih.gov/pubmed/36032940 http://dx.doi.org/10.7717/peerj.13837

_version_	1784776211496435712
author	Høye, Toke T. Dyrmann, Mads Kjær, Christian Nielsen, Johnny Bruus, Marianne Mielec, Cecilie L. Vesterdal, Maria S. Bjerge, Kim Madsen, Sigurd A. Jeppesen, Mads R. Melvad, Claus
author_facet	Høye, Toke T. Dyrmann, Mads Kjær, Christian Nielsen, Johnny Bruus, Marianne Mielec, Cecilie L. Vesterdal, Maria S. Bjerge, Kim Madsen, Sigurd A. Jeppesen, Mads R. Melvad, Claus
author_sort	Høye, Toke T.
collection	PubMed
description	Image-based methods for species identification offer cost-efficient solutions for biomonitoring. This is particularly relevant for invertebrate studies, where bulk samples often represent insurmountable workloads for sorting, identifying, and counting individual specimens. On the other hand, image-based classification using deep learning tools have strict requirements for the amount of training data, which is often a limiting factor. Here, we examine how classification accuracy increases with the amount of training data using the BIODISCOVER imaging system constructed for image-based classification and biomass estimation of invertebrate specimens. We use a balanced dataset of 60 specimens of each of 16 taxa of freshwater macroinvertebrates to systematically quantify how classification performance of a convolutional neural network (CNN) increases for individual taxa and the overall community as the number of specimens used for training is increased. We show a striking 99.2% classification accuracy when the CNN (EfficientNet-B6) is trained on 50 specimens of each taxon, and also how the lower classification accuracy of models trained on less data is particularly evident for morphologically similar species placed within the same taxonomic order. Even with as little as 15 specimens used for training, classification accuracy reached 97%. Our results add to a recent body of literature showing the huge potential of image-based methods and deep learning for specimen-based research, and furthermore offers a perspective to future automatized approaches for deriving ecological data from bulk arthropod samples.
format	Online Article Text
id	pubmed-9415355
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-94153552022-08-27 Accurate image-based identification of macroinvertebrate specimens using deep learning—How much training data is needed? Høye, Toke T. Dyrmann, Mads Kjær, Christian Nielsen, Johnny Bruus, Marianne Mielec, Cecilie L. Vesterdal, Maria S. Bjerge, Kim Madsen, Sigurd A. Jeppesen, Mads R. Melvad, Claus PeerJ Biodiversity Image-based methods for species identification offer cost-efficient solutions for biomonitoring. This is particularly relevant for invertebrate studies, where bulk samples often represent insurmountable workloads for sorting, identifying, and counting individual specimens. On the other hand, image-based classification using deep learning tools have strict requirements for the amount of training data, which is often a limiting factor. Here, we examine how classification accuracy increases with the amount of training data using the BIODISCOVER imaging system constructed for image-based classification and biomass estimation of invertebrate specimens. We use a balanced dataset of 60 specimens of each of 16 taxa of freshwater macroinvertebrates to systematically quantify how classification performance of a convolutional neural network (CNN) increases for individual taxa and the overall community as the number of specimens used for training is increased. We show a striking 99.2% classification accuracy when the CNN (EfficientNet-B6) is trained on 50 specimens of each taxon, and also how the lower classification accuracy of models trained on less data is particularly evident for morphologically similar species placed within the same taxonomic order. Even with as little as 15 specimens used for training, classification accuracy reached 97%. Our results add to a recent body of literature showing the huge potential of image-based methods and deep learning for specimen-based research, and furthermore offers a perspective to future automatized approaches for deriving ecological data from bulk arthropod samples. PeerJ Inc. 2022-08-23 /pmc/articles/PMC9415355/ /pubmed/36032940 http://dx.doi.org/10.7717/peerj.13837 Text en ©2022 Høye et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle	Biodiversity Høye, Toke T. Dyrmann, Mads Kjær, Christian Nielsen, Johnny Bruus, Marianne Mielec, Cecilie L. Vesterdal, Maria S. Bjerge, Kim Madsen, Sigurd A. Jeppesen, Mads R. Melvad, Claus Accurate image-based identification of macroinvertebrate specimens using deep learning—How much training data is needed?
title	Accurate image-based identification of macroinvertebrate specimens using deep learning—How much training data is needed?
title_full	Accurate image-based identification of macroinvertebrate specimens using deep learning—How much training data is needed?
title_fullStr	Accurate image-based identification of macroinvertebrate specimens using deep learning—How much training data is needed?
title_full_unstemmed	Accurate image-based identification of macroinvertebrate specimens using deep learning—How much training data is needed?
title_short	Accurate image-based identification of macroinvertebrate specimens using deep learning—How much training data is needed?
title_sort	accurate image-based identification of macroinvertebrate specimens using deep learning—how much training data is needed?
topic	Biodiversity
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9415355/ https://www.ncbi.nlm.nih.gov/pubmed/36032940 http://dx.doi.org/10.7717/peerj.13837
work_keys_str_mv	AT høyetoket accurateimagebasedidentificationofmacroinvertebratespecimensusingdeeplearninghowmuchtrainingdataisneeded AT dyrmannmads accurateimagebasedidentificationofmacroinvertebratespecimensusingdeeplearninghowmuchtrainingdataisneeded AT kjærchristian accurateimagebasedidentificationofmacroinvertebratespecimensusingdeeplearninghowmuchtrainingdataisneeded AT nielsenjohnny accurateimagebasedidentificationofmacroinvertebratespecimensusingdeeplearninghowmuchtrainingdataisneeded AT bruusmarianne accurateimagebasedidentificationofmacroinvertebratespecimensusingdeeplearninghowmuchtrainingdataisneeded AT mielecceciliel accurateimagebasedidentificationofmacroinvertebratespecimensusingdeeplearninghowmuchtrainingdataisneeded AT vesterdalmarias accurateimagebasedidentificationofmacroinvertebratespecimensusingdeeplearninghowmuchtrainingdataisneeded AT bjergekim accurateimagebasedidentificationofmacroinvertebratespecimensusingdeeplearninghowmuchtrainingdataisneeded AT madsensigurda accurateimagebasedidentificationofmacroinvertebratespecimensusingdeeplearninghowmuchtrainingdataisneeded AT jeppesenmadsr accurateimagebasedidentificationofmacroinvertebratespecimensusingdeeplearninghowmuchtrainingdataisneeded AT melvadclaus accurateimagebasedidentificationofmacroinvertebratespecimensusingdeeplearninghowmuchtrainingdataisneeded

Accurate image-based identification of macroinvertebrate specimens using deep learning—How much training data is needed?

Ejemplares similares