Cargando…
Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning
PREMISE: Digitized images of herbarium specimens are highly diverse with many potential sources of visual noise and bias. The systematic removal of noise and minimization of bias must be achieved in order to generate biological insights based on the plants rather than the digitization and mounting p...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley and Sons Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7328659/ https://www.ncbi.nlm.nih.gov/pubmed/32626607 http://dx.doi.org/10.1002/aps3.11352 |
_version_ | 1783552771118596096 |
---|---|
author | White, Alexander E. Dikow, Rebecca B. Baugh, Makinnon Jenkins, Abigail Frandsen, Paul B. |
author_facet | White, Alexander E. Dikow, Rebecca B. Baugh, Makinnon Jenkins, Abigail Frandsen, Paul B. |
author_sort | White, Alexander E. |
collection | PubMed |
description | PREMISE: Digitized images of herbarium specimens are highly diverse with many potential sources of visual noise and bias. The systematic removal of noise and minimization of bias must be achieved in order to generate biological insights based on the plants rather than the digitization and mounting practices involved. Here, we develop a workflow and data set of high‐resolution image masks to segment plant tissues in herbarium specimen images and remove background pixels using deep learning. METHODS AND RESULTS: We generated 400 curated, high‐resolution masks of ferns using a combination of automatic and manual tools for image manipulation. We used those images to train a U‐Net‐style deep learning model for image segmentation, achieving a final Sørensen–Dice coefficient of 0.96. The resulting model can automatically, efficiently, and accurately segment massive data sets of digitized herbarium specimens, particularly for ferns. CONCLUSIONS: The application of deep learning in herbarium sciences requires transparent and systematic protocols for generating training data so that these labor‐intensive resources can be generalized to other deep learning applications. Segmentation ground‐truth masks are hard‐won data, and we share these data and the model openly in the hopes of furthering model training and transfer learning opportunities for broader herbarium applications. |
format | Online Article Text |
id | pubmed-7328659 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | John Wiley and Sons Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-73286592020-07-02 Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning White, Alexander E. Dikow, Rebecca B. Baugh, Makinnon Jenkins, Abigail Frandsen, Paul B. Appl Plant Sci Application Articles PREMISE: Digitized images of herbarium specimens are highly diverse with many potential sources of visual noise and bias. The systematic removal of noise and minimization of bias must be achieved in order to generate biological insights based on the plants rather than the digitization and mounting practices involved. Here, we develop a workflow and data set of high‐resolution image masks to segment plant tissues in herbarium specimen images and remove background pixels using deep learning. METHODS AND RESULTS: We generated 400 curated, high‐resolution masks of ferns using a combination of automatic and manual tools for image manipulation. We used those images to train a U‐Net‐style deep learning model for image segmentation, achieving a final Sørensen–Dice coefficient of 0.96. The resulting model can automatically, efficiently, and accurately segment massive data sets of digitized herbarium specimens, particularly for ferns. CONCLUSIONS: The application of deep learning in herbarium sciences requires transparent and systematic protocols for generating training data so that these labor‐intensive resources can be generalized to other deep learning applications. Segmentation ground‐truth masks are hard‐won data, and we share these data and the model openly in the hopes of furthering model training and transfer learning opportunities for broader herbarium applications. John Wiley and Sons Inc. 2020-07-01 /pmc/articles/PMC7328659/ /pubmed/32626607 http://dx.doi.org/10.1002/aps3.11352 Text en © 2020 The Authors. Applications in Plant Sciences is published by Wiley Periodicals, LLC on behalf of the Botanical Society of America This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Application Articles White, Alexander E. Dikow, Rebecca B. Baugh, Makinnon Jenkins, Abigail Frandsen, Paul B. Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning |
title | Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning |
title_full | Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning |
title_fullStr | Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning |
title_full_unstemmed | Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning |
title_short | Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning |
title_sort | generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning |
topic | Application Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7328659/ https://www.ncbi.nlm.nih.gov/pubmed/32626607 http://dx.doi.org/10.1002/aps3.11352 |
work_keys_str_mv | AT whitealexandere generatingsegmentationmasksofherbariumspecimensandadatasetfortrainingsegmentationmodelsusingdeeplearning AT dikowrebeccab generatingsegmentationmasksofherbariumspecimensandadatasetfortrainingsegmentationmodelsusingdeeplearning AT baughmakinnon generatingsegmentationmasksofherbariumspecimensandadatasetfortrainingsegmentationmodelsusingdeeplearning AT jenkinsabigail generatingsegmentationmasksofherbariumspecimensandadatasetfortrainingsegmentationmodelsusingdeeplearning AT frandsenpaulb generatingsegmentationmasksofherbariumspecimensandadatasetfortrainingsegmentationmodelsusingdeeplearning |