Cargando…

An image dataset of cleared, x-rayed, and fossil leaves vetted to plant family for human and machine learning

Leaves are the most abundant and visible plant organ, both in the modern world and the fossil record. Identifying foliage to the correct plant family based on leaf architecture is a fundamental botanical skill that is also critical for isolated fossil leaves, which often, especially in the Cenozoic,...

Descripción completa

Detalles Bibliográficos
Autores principales: Wilf, Peter, Wing, Scott L., Meyer, Herbert W., Rose, Jacob A., Saha, Rohit, Serre, Thomas, Cúneo, N. Rubén, Donovan, Michael P., Erwin, Diane M., Gandolfo, María A., González-Akre, Erika, Herrera, Fabiany, Hu, Shusheng, Iglesias, Ari, Johnson, Kirk R., Karim, Talia S., Zou, Xiaoyu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Pensoft Publishers 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8702526/
https://www.ncbi.nlm.nih.gov/pubmed/35068970
http://dx.doi.org/10.3897/phytokeys.187.72350
_version_ 1784621259283234816
author Wilf, Peter
Wing, Scott L.
Meyer, Herbert W.
Rose, Jacob A.
Saha, Rohit
Serre, Thomas
Cúneo, N. Rubén
Donovan, Michael P.
Erwin, Diane M.
Gandolfo, María A.
González-Akre, Erika
Herrera, Fabiany
Hu, Shusheng
Iglesias, Ari
Johnson, Kirk R.
Karim, Talia S.
Zou, Xiaoyu
author_facet Wilf, Peter
Wing, Scott L.
Meyer, Herbert W.
Rose, Jacob A.
Saha, Rohit
Serre, Thomas
Cúneo, N. Rubén
Donovan, Michael P.
Erwin, Diane M.
Gandolfo, María A.
González-Akre, Erika
Herrera, Fabiany
Hu, Shusheng
Iglesias, Ari
Johnson, Kirk R.
Karim, Talia S.
Zou, Xiaoyu
author_sort Wilf, Peter
collection PubMed
description Leaves are the most abundant and visible plant organ, both in the modern world and the fossil record. Identifying foliage to the correct plant family based on leaf architecture is a fundamental botanical skill that is also critical for isolated fossil leaves, which often, especially in the Cenozoic, represent extinct genera and species from extant families. Resources focused on leaf identification are remarkably scarce; however, the situation has improved due to the recent proliferation of digitized herbarium material, live-plant identification applications, and online collections of cleared and fossil leaf images. Nevertheless, the need remains for a specialized image dataset for comparative leaf architecture. We address this gap by assembling an open-access database of 30,252 images of vouchered leaf specimens vetted to family level, primarily of angiosperms, including 26,176 images of cleared and x-rayed leaves representing 354 families and 4,076 of fossil leaves from 48 families. The images maintain original resolution, have user-friendly filenames, and are vetted using APG and modern paleobotanical standards. The cleared and x-rayed leaves include the Jack A. Wolfe and Leo J. Hickey contributions to the National Cleared Leaf Collection and a collection of high-resolution scanned x-ray negatives, housed in the Division of Paleobotany, Department of Paleobiology, Smithsonian National Museum of Natural History, Washington D.C.; and the Daniel I. Axelrod Cleared Leaf Collection, housed at the University of California Museum of Paleontology, Berkeley. The fossil images include a sampling of Late Cretaceous to Eocene paleobotanical sites from the Western Hemisphere held at numerous institutions, especially from Florissant Fossil Beds National Monument (late Eocene, Colorado), as well as several other localities from the Late Cretaceous to Eocene of the Western USA and the early Paleogene of Colombia and southern Argentina. The dataset facilitates new research and education opportunities in paleobotany, comparative leaf architecture, systematics, and machine learning.
format Online
Article
Text
id pubmed-8702526
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Pensoft Publishers
record_format MEDLINE/PubMed
spelling pubmed-87025262022-01-20 An image dataset of cleared, x-rayed, and fossil leaves vetted to plant family for human and machine learning Wilf, Peter Wing, Scott L. Meyer, Herbert W. Rose, Jacob A. Saha, Rohit Serre, Thomas Cúneo, N. Rubén Donovan, Michael P. Erwin, Diane M. Gandolfo, María A. González-Akre, Erika Herrera, Fabiany Hu, Shusheng Iglesias, Ari Johnson, Kirk R. Karim, Talia S. Zou, Xiaoyu PhytoKeys Research Article Leaves are the most abundant and visible plant organ, both in the modern world and the fossil record. Identifying foliage to the correct plant family based on leaf architecture is a fundamental botanical skill that is also critical for isolated fossil leaves, which often, especially in the Cenozoic, represent extinct genera and species from extant families. Resources focused on leaf identification are remarkably scarce; however, the situation has improved due to the recent proliferation of digitized herbarium material, live-plant identification applications, and online collections of cleared and fossil leaf images. Nevertheless, the need remains for a specialized image dataset for comparative leaf architecture. We address this gap by assembling an open-access database of 30,252 images of vouchered leaf specimens vetted to family level, primarily of angiosperms, including 26,176 images of cleared and x-rayed leaves representing 354 families and 4,076 of fossil leaves from 48 families. The images maintain original resolution, have user-friendly filenames, and are vetted using APG and modern paleobotanical standards. The cleared and x-rayed leaves include the Jack A. Wolfe and Leo J. Hickey contributions to the National Cleared Leaf Collection and a collection of high-resolution scanned x-ray negatives, housed in the Division of Paleobotany, Department of Paleobiology, Smithsonian National Museum of Natural History, Washington D.C.; and the Daniel I. Axelrod Cleared Leaf Collection, housed at the University of California Museum of Paleontology, Berkeley. The fossil images include a sampling of Late Cretaceous to Eocene paleobotanical sites from the Western Hemisphere held at numerous institutions, especially from Florissant Fossil Beds National Monument (late Eocene, Colorado), as well as several other localities from the Late Cretaceous to Eocene of the Western USA and the early Paleogene of Colombia and southern Argentina. The dataset facilitates new research and education opportunities in paleobotany, comparative leaf architecture, systematics, and machine learning. Pensoft Publishers 2021-12-16 /pmc/articles/PMC8702526/ /pubmed/35068970 http://dx.doi.org/10.3897/phytokeys.187.72350 Text en https://creativecommons.org/publicdomain/zero/1.0/This is an open access article distributed under the terms of the CC0 Public Domain Dedication.
spellingShingle Research Article
Wilf, Peter
Wing, Scott L.
Meyer, Herbert W.
Rose, Jacob A.
Saha, Rohit
Serre, Thomas
Cúneo, N. Rubén
Donovan, Michael P.
Erwin, Diane M.
Gandolfo, María A.
González-Akre, Erika
Herrera, Fabiany
Hu, Shusheng
Iglesias, Ari
Johnson, Kirk R.
Karim, Talia S.
Zou, Xiaoyu
An image dataset of cleared, x-rayed, and fossil leaves vetted to plant family for human and machine learning
title An image dataset of cleared, x-rayed, and fossil leaves vetted to plant family for human and machine learning
title_full An image dataset of cleared, x-rayed, and fossil leaves vetted to plant family for human and machine learning
title_fullStr An image dataset of cleared, x-rayed, and fossil leaves vetted to plant family for human and machine learning
title_full_unstemmed An image dataset of cleared, x-rayed, and fossil leaves vetted to plant family for human and machine learning
title_short An image dataset of cleared, x-rayed, and fossil leaves vetted to plant family for human and machine learning
title_sort an image dataset of cleared, x-rayed, and fossil leaves vetted to plant family for human and machine learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8702526/
https://www.ncbi.nlm.nih.gov/pubmed/35068970
http://dx.doi.org/10.3897/phytokeys.187.72350
work_keys_str_mv AT wilfpeter animagedatasetofclearedxrayedandfossilleavesvettedtoplantfamilyforhumanandmachinelearning
AT wingscottl animagedatasetofclearedxrayedandfossilleavesvettedtoplantfamilyforhumanandmachinelearning
AT meyerherbertw animagedatasetofclearedxrayedandfossilleavesvettedtoplantfamilyforhumanandmachinelearning
AT rosejacoba animagedatasetofclearedxrayedandfossilleavesvettedtoplantfamilyforhumanandmachinelearning
AT saharohit animagedatasetofclearedxrayedandfossilleavesvettedtoplantfamilyforhumanandmachinelearning
AT serrethomas animagedatasetofclearedxrayedandfossilleavesvettedtoplantfamilyforhumanandmachinelearning
AT cuneonruben animagedatasetofclearedxrayedandfossilleavesvettedtoplantfamilyforhumanandmachinelearning
AT donovanmichaelp animagedatasetofclearedxrayedandfossilleavesvettedtoplantfamilyforhumanandmachinelearning
AT erwindianem animagedatasetofclearedxrayedandfossilleavesvettedtoplantfamilyforhumanandmachinelearning
AT gandolfomariaa animagedatasetofclearedxrayedandfossilleavesvettedtoplantfamilyforhumanandmachinelearning
AT gonzalezakreerika animagedatasetofclearedxrayedandfossilleavesvettedtoplantfamilyforhumanandmachinelearning
AT herrerafabiany animagedatasetofclearedxrayedandfossilleavesvettedtoplantfamilyforhumanandmachinelearning
AT hushusheng animagedatasetofclearedxrayedandfossilleavesvettedtoplantfamilyforhumanandmachinelearning
AT iglesiasari animagedatasetofclearedxrayedandfossilleavesvettedtoplantfamilyforhumanandmachinelearning
AT johnsonkirkr animagedatasetofclearedxrayedandfossilleavesvettedtoplantfamilyforhumanandmachinelearning
AT karimtalias animagedatasetofclearedxrayedandfossilleavesvettedtoplantfamilyforhumanandmachinelearning
AT zouxiaoyu animagedatasetofclearedxrayedandfossilleavesvettedtoplantfamilyforhumanandmachinelearning