Cargando…

Harnessing Large-Scale Herbarium Image Datasets Through Representation Learning

The mobilization of large-scale datasets of specimen images and metadata through herbarium digitization provide a rich environment for the application and development of machine learning techniques. However, limited access to computational resources and uneven progress in digitization, especially fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Walker, Barnaby E., Tucker, Allan, Nicolson, Nicky
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8794728/
https://www.ncbi.nlm.nih.gov/pubmed/35095977
http://dx.doi.org/10.3389/fpls.2021.806407
_version_ 1784640884533362688
author Walker, Barnaby E.
Tucker, Allan
Nicolson, Nicky
author_facet Walker, Barnaby E.
Tucker, Allan
Nicolson, Nicky
author_sort Walker, Barnaby E.
collection PubMed
description The mobilization of large-scale datasets of specimen images and metadata through herbarium digitization provide a rich environment for the application and development of machine learning techniques. However, limited access to computational resources and uneven progress in digitization, especially for small herbaria, still present barriers to the wide adoption of these new technologies. Using deep learning to extract representations of herbarium specimens useful for a wide variety of applications, so-called “representation learning,” could help remove these barriers. Despite its recent popularity for camera trap and natural world images, representation learning is not yet as popular for herbarium specimen images. We investigated the potential of representation learning with specimen images by building three neural networks using a publicly available dataset of over 2 million specimen images spanning multiple continents and institutions. We compared the extracted representations and tested their performance in application tasks relevant to research carried out with herbarium specimens. We found a triplet network, a type of neural network that learns distances between images, produced representations that transferred the best across all applications investigated. Our results demonstrate that it is possible to learn representations of specimen images useful in different applications, and we identify some further steps that we believe are necessary for representation learning to harness the rich information held in the worlds’ herbaria.
format Online
Article
Text
id pubmed-8794728
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-87947282022-01-28 Harnessing Large-Scale Herbarium Image Datasets Through Representation Learning Walker, Barnaby E. Tucker, Allan Nicolson, Nicky Front Plant Sci Plant Science The mobilization of large-scale datasets of specimen images and metadata through herbarium digitization provide a rich environment for the application and development of machine learning techniques. However, limited access to computational resources and uneven progress in digitization, especially for small herbaria, still present barriers to the wide adoption of these new technologies. Using deep learning to extract representations of herbarium specimens useful for a wide variety of applications, so-called “representation learning,” could help remove these barriers. Despite its recent popularity for camera trap and natural world images, representation learning is not yet as popular for herbarium specimen images. We investigated the potential of representation learning with specimen images by building three neural networks using a publicly available dataset of over 2 million specimen images spanning multiple continents and institutions. We compared the extracted representations and tested their performance in application tasks relevant to research carried out with herbarium specimens. We found a triplet network, a type of neural network that learns distances between images, produced representations that transferred the best across all applications investigated. Our results demonstrate that it is possible to learn representations of specimen images useful in different applications, and we identify some further steps that we believe are necessary for representation learning to harness the rich information held in the worlds’ herbaria. Frontiers Media S.A. 2022-01-13 /pmc/articles/PMC8794728/ /pubmed/35095977 http://dx.doi.org/10.3389/fpls.2021.806407 Text en Copyright © 2022 Walker, Tucker and Nicolson. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Plant Science
Walker, Barnaby E.
Tucker, Allan
Nicolson, Nicky
Harnessing Large-Scale Herbarium Image Datasets Through Representation Learning
title Harnessing Large-Scale Herbarium Image Datasets Through Representation Learning
title_full Harnessing Large-Scale Herbarium Image Datasets Through Representation Learning
title_fullStr Harnessing Large-Scale Herbarium Image Datasets Through Representation Learning
title_full_unstemmed Harnessing Large-Scale Herbarium Image Datasets Through Representation Learning
title_short Harnessing Large-Scale Herbarium Image Datasets Through Representation Learning
title_sort harnessing large-scale herbarium image datasets through representation learning
topic Plant Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8794728/
https://www.ncbi.nlm.nih.gov/pubmed/35095977
http://dx.doi.org/10.3389/fpls.2021.806407
work_keys_str_mv AT walkerbarnabye harnessinglargescaleherbariumimagedatasetsthroughrepresentationlearning
AT tuckerallan harnessinglargescaleherbariumimagedatasetsthroughrepresentationlearning
AT nicolsonnicky harnessinglargescaleherbariumimagedatasetsthroughrepresentationlearning