Cargando…

Automated Taxonomic Identification of Insects with Expert-Level Accuracy Using Effective Feature Transfer from Convolutional Networks

Rapid and reliable identification of insects is important in many contexts, from the detection of disease vectors and invasive species to the sorting of material from biodiversity inventories. Because of the shortage of adequate expertise, there has long been an interest in developing automated syst...

Descripción completa

Detalles Bibliográficos
Autores principales: Valan, Miroslav, Makonyi, Karoly, Maki, Atsuto, Vondráček, Dominik, Ronquist, Fredrik
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6802574/
https://www.ncbi.nlm.nih.gov/pubmed/30825372
http://dx.doi.org/10.1093/sysbio/syz014
_version_ 1783460812897124352
author Valan, Miroslav
Makonyi, Karoly
Maki, Atsuto
Vondráček, Dominik
Ronquist, Fredrik
author_facet Valan, Miroslav
Makonyi, Karoly
Maki, Atsuto
Vondráček, Dominik
Ronquist, Fredrik
author_sort Valan, Miroslav
collection PubMed
description Rapid and reliable identification of insects is important in many contexts, from the detection of disease vectors and invasive species to the sorting of material from biodiversity inventories. Because of the shortage of adequate expertise, there has long been an interest in developing automated systems for this task. Previous attempts have been based on laborious and complex handcrafted extraction of image features, but in recent years it has been shown that sophisticated convolutional neural networks (CNNs) can learn to extract relevant features automatically, without human intervention. Unfortunately, reaching expert-level accuracy in CNN identifications requires substantial computational power and huge training data sets, which are often not available for taxonomic tasks. This can be addressed using feature transfer: a CNN that has been pretrained on a generic image classification task is exposed to the taxonomic images of interest, and information about its perception of those images is used in training a simpler, dedicated identification system. Here, we develop an effective method of CNN feature transfer, which achieves expert-level accuracy in taxonomic identification of insects with training sets of 100 images or less per category, depending on the nature of data set. Specifically, we extract rich representations of intermediate to high-level image features from the CNN architecture VGG16 pretrained on the ImageNet data set. This information is submitted to a linear support vector machine classifier, which is trained on the target problem. We tested the performance of our approach on two types of challenging taxonomic tasks: 1) identifying insects to higher groups when they are likely to belong to subgroups that have not been seen previously and 2) identifying visually similar species that are difficult to separate even for experts. For the first task, our approach reached [Formula: see text] 92% accuracy on one data set (884 face images of 11 families of Diptera, all specimens representing unique species), and [Formula: see text] 96% accuracy on another (2936 dorsal habitus images of 14 families of Coleoptera, over 90% of specimens belonging to unique species). For the second task, our approach outperformed a leading taxonomic expert on one data set (339 images of three species of the Coleoptera genus Oxythyrea; 97% accuracy), and both humans and traditional automated identification systems on another data set (3845 images of nine species of Plecoptera larvae; 98.6 % accuracy). Reanalyzing several biological image identification tasks studied in the recent literature, we show that our approach is broadly applicable and provides significant improvements over previous methods, whether based on dedicated CNNs, CNN feature transfer, or more traditional techniques. Thus, our method, which is easy to apply, can be highly successful in developing automated taxonomic identification systems even when training data sets are small and computational budgets limited. We conclude by briefly discussing some promising CNN-based research directions in morphological systematics opened up by the success of these techniques in providing accurate diagnostic tools.
format Online
Article
Text
id pubmed-6802574
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-68025742019-10-24 Automated Taxonomic Identification of Insects with Expert-Level Accuracy Using Effective Feature Transfer from Convolutional Networks Valan, Miroslav Makonyi, Karoly Maki, Atsuto Vondráček, Dominik Ronquist, Fredrik Syst Biol Regular Articles Rapid and reliable identification of insects is important in many contexts, from the detection of disease vectors and invasive species to the sorting of material from biodiversity inventories. Because of the shortage of adequate expertise, there has long been an interest in developing automated systems for this task. Previous attempts have been based on laborious and complex handcrafted extraction of image features, but in recent years it has been shown that sophisticated convolutional neural networks (CNNs) can learn to extract relevant features automatically, without human intervention. Unfortunately, reaching expert-level accuracy in CNN identifications requires substantial computational power and huge training data sets, which are often not available for taxonomic tasks. This can be addressed using feature transfer: a CNN that has been pretrained on a generic image classification task is exposed to the taxonomic images of interest, and information about its perception of those images is used in training a simpler, dedicated identification system. Here, we develop an effective method of CNN feature transfer, which achieves expert-level accuracy in taxonomic identification of insects with training sets of 100 images or less per category, depending on the nature of data set. Specifically, we extract rich representations of intermediate to high-level image features from the CNN architecture VGG16 pretrained on the ImageNet data set. This information is submitted to a linear support vector machine classifier, which is trained on the target problem. We tested the performance of our approach on two types of challenging taxonomic tasks: 1) identifying insects to higher groups when they are likely to belong to subgroups that have not been seen previously and 2) identifying visually similar species that are difficult to separate even for experts. For the first task, our approach reached [Formula: see text] 92% accuracy on one data set (884 face images of 11 families of Diptera, all specimens representing unique species), and [Formula: see text] 96% accuracy on another (2936 dorsal habitus images of 14 families of Coleoptera, over 90% of specimens belonging to unique species). For the second task, our approach outperformed a leading taxonomic expert on one data set (339 images of three species of the Coleoptera genus Oxythyrea; 97% accuracy), and both humans and traditional automated identification systems on another data set (3845 images of nine species of Plecoptera larvae; 98.6 % accuracy). Reanalyzing several biological image identification tasks studied in the recent literature, we show that our approach is broadly applicable and provides significant improvements over previous methods, whether based on dedicated CNNs, CNN feature transfer, or more traditional techniques. Thus, our method, which is easy to apply, can be highly successful in developing automated taxonomic identification systems even when training data sets are small and computational budgets limited. We conclude by briefly discussing some promising CNN-based research directions in morphological systematics opened up by the success of these techniques in providing accurate diagnostic tools. Oxford University Press 2019-11 2019-03-02 /pmc/articles/PMC6802574/ /pubmed/30825372 http://dx.doi.org/10.1093/sysbio/syz014 Text en © The Author(s) 2019. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Regular Articles
Valan, Miroslav
Makonyi, Karoly
Maki, Atsuto
Vondráček, Dominik
Ronquist, Fredrik
Automated Taxonomic Identification of Insects with Expert-Level Accuracy Using Effective Feature Transfer from Convolutional Networks
title Automated Taxonomic Identification of Insects with Expert-Level Accuracy Using Effective Feature Transfer from Convolutional Networks
title_full Automated Taxonomic Identification of Insects with Expert-Level Accuracy Using Effective Feature Transfer from Convolutional Networks
title_fullStr Automated Taxonomic Identification of Insects with Expert-Level Accuracy Using Effective Feature Transfer from Convolutional Networks
title_full_unstemmed Automated Taxonomic Identification of Insects with Expert-Level Accuracy Using Effective Feature Transfer from Convolutional Networks
title_short Automated Taxonomic Identification of Insects with Expert-Level Accuracy Using Effective Feature Transfer from Convolutional Networks
title_sort automated taxonomic identification of insects with expert-level accuracy using effective feature transfer from convolutional networks
topic Regular Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6802574/
https://www.ncbi.nlm.nih.gov/pubmed/30825372
http://dx.doi.org/10.1093/sysbio/syz014
work_keys_str_mv AT valanmiroslav automatedtaxonomicidentificationofinsectswithexpertlevelaccuracyusingeffectivefeaturetransferfromconvolutionalnetworks
AT makonyikaroly automatedtaxonomicidentificationofinsectswithexpertlevelaccuracyusingeffectivefeaturetransferfromconvolutionalnetworks
AT makiatsuto automatedtaxonomicidentificationofinsectswithexpertlevelaccuracyusingeffectivefeaturetransferfromconvolutionalnetworks
AT vondracekdominik automatedtaxonomicidentificationofinsectswithexpertlevelaccuracyusingeffectivefeaturetransferfromconvolutionalnetworks
AT ronquistfredrik automatedtaxonomicidentificationofinsectswithexpertlevelaccuracyusingeffectivefeaturetransferfromconvolutionalnetworks