Cargando…

Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning

PREMISE: Digitized images of herbarium specimens are highly diverse with many potential sources of visual noise and bias. The systematic removal of noise and minimization of bias must be achieved in order to generate biological insights based on the plants rather than the digitization and mounting p...

Descripción completa

Detalles Bibliográficos
Autores principales: White, Alexander E., Dikow, Rebecca B., Baugh, Makinnon, Jenkins, Abigail, Frandsen, Paul B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7328659/
https://www.ncbi.nlm.nih.gov/pubmed/32626607
http://dx.doi.org/10.1002/aps3.11352
_version_ 1783552771118596096
author White, Alexander E.
Dikow, Rebecca B.
Baugh, Makinnon
Jenkins, Abigail
Frandsen, Paul B.
author_facet White, Alexander E.
Dikow, Rebecca B.
Baugh, Makinnon
Jenkins, Abigail
Frandsen, Paul B.
author_sort White, Alexander E.
collection PubMed
description PREMISE: Digitized images of herbarium specimens are highly diverse with many potential sources of visual noise and bias. The systematic removal of noise and minimization of bias must be achieved in order to generate biological insights based on the plants rather than the digitization and mounting practices involved. Here, we develop a workflow and data set of high‐resolution image masks to segment plant tissues in herbarium specimen images and remove background pixels using deep learning. METHODS AND RESULTS: We generated 400 curated, high‐resolution masks of ferns using a combination of automatic and manual tools for image manipulation. We used those images to train a U‐Net‐style deep learning model for image segmentation, achieving a final Sørensen–Dice coefficient of 0.96. The resulting model can automatically, efficiently, and accurately segment massive data sets of digitized herbarium specimens, particularly for ferns. CONCLUSIONS: The application of deep learning in herbarium sciences requires transparent and systematic protocols for generating training data so that these labor‐intensive resources can be generalized to other deep learning applications. Segmentation ground‐truth masks are hard‐won data, and we share these data and the model openly in the hopes of furthering model training and transfer learning opportunities for broader herbarium applications.
format Online
Article
Text
id pubmed-7328659
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-73286592020-07-02 Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning White, Alexander E. Dikow, Rebecca B. Baugh, Makinnon Jenkins, Abigail Frandsen, Paul B. Appl Plant Sci Application Articles PREMISE: Digitized images of herbarium specimens are highly diverse with many potential sources of visual noise and bias. The systematic removal of noise and minimization of bias must be achieved in order to generate biological insights based on the plants rather than the digitization and mounting practices involved. Here, we develop a workflow and data set of high‐resolution image masks to segment plant tissues in herbarium specimen images and remove background pixels using deep learning. METHODS AND RESULTS: We generated 400 curated, high‐resolution masks of ferns using a combination of automatic and manual tools for image manipulation. We used those images to train a U‐Net‐style deep learning model for image segmentation, achieving a final Sørensen–Dice coefficient of 0.96. The resulting model can automatically, efficiently, and accurately segment massive data sets of digitized herbarium specimens, particularly for ferns. CONCLUSIONS: The application of deep learning in herbarium sciences requires transparent and systematic protocols for generating training data so that these labor‐intensive resources can be generalized to other deep learning applications. Segmentation ground‐truth masks are hard‐won data, and we share these data and the model openly in the hopes of furthering model training and transfer learning opportunities for broader herbarium applications. John Wiley and Sons Inc. 2020-07-01 /pmc/articles/PMC7328659/ /pubmed/32626607 http://dx.doi.org/10.1002/aps3.11352 Text en © 2020 The Authors. Applications in Plant Sciences is published by Wiley Periodicals, LLC on behalf of the Botanical Society of America This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Application Articles
White, Alexander E.
Dikow, Rebecca B.
Baugh, Makinnon
Jenkins, Abigail
Frandsen, Paul B.
Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning
title Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning
title_full Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning
title_fullStr Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning
title_full_unstemmed Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning
title_short Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning
title_sort generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning
topic Application Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7328659/
https://www.ncbi.nlm.nih.gov/pubmed/32626607
http://dx.doi.org/10.1002/aps3.11352
work_keys_str_mv AT whitealexandere generatingsegmentationmasksofherbariumspecimensandadatasetfortrainingsegmentationmodelsusingdeeplearning
AT dikowrebeccab generatingsegmentationmasksofherbariumspecimensandadatasetfortrainingsegmentationmodelsusingdeeplearning
AT baughmakinnon generatingsegmentationmasksofherbariumspecimensandadatasetfortrainingsegmentationmodelsusingdeeplearning
AT jenkinsabigail generatingsegmentationmasksofherbariumspecimensandadatasetfortrainingsegmentationmodelsusingdeeplearning
AT frandsenpaulb generatingsegmentationmasksofherbariumspecimensandadatasetfortrainingsegmentationmodelsusingdeeplearning