Cargando…

Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning

PREMISE: Digitized images of herbarium specimens are highly diverse with many potential sources of visual noise and bias. The systematic removal of noise and minimization of bias must be achieved in order to generate biological insights based on the plants rather than the digitization and mounting p...

Descripción completa

Detalles Bibliográficos
Autores principales: White, Alexander E., Dikow, Rebecca B., Baugh, Makinnon, Jenkins, Abigail, Frandsen, Paul B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7328659/
https://www.ncbi.nlm.nih.gov/pubmed/32626607
http://dx.doi.org/10.1002/aps3.11352
Descripción
Sumario:PREMISE: Digitized images of herbarium specimens are highly diverse with many potential sources of visual noise and bias. The systematic removal of noise and minimization of bias must be achieved in order to generate biological insights based on the plants rather than the digitization and mounting practices involved. Here, we develop a workflow and data set of high‐resolution image masks to segment plant tissues in herbarium specimen images and remove background pixels using deep learning. METHODS AND RESULTS: We generated 400 curated, high‐resolution masks of ferns using a combination of automatic and manual tools for image manipulation. We used those images to train a U‐Net‐style deep learning model for image segmentation, achieving a final Sørensen–Dice coefficient of 0.96. The resulting model can automatically, efficiently, and accurately segment massive data sets of digitized herbarium specimens, particularly for ferns. CONCLUSIONS: The application of deep learning in herbarium sciences requires transparent and systematic protocols for generating training data so that these labor‐intensive resources can be generalized to other deep learning applications. Segmentation ground‐truth masks are hard‐won data, and we share these data and the model openly in the hopes of furthering model training and transfer learning opportunities for broader herbarium applications.