Cargando…

Geo Fossils-I: A synthetic dataset of 2D fossil images for computer vision applications on geology

Geo Fossils-I is a synthetic image dataset used as a solution for resolving the limited availability of geological datasets intended for image classification and object detection on 2D images of geological outcrops. The Geo Fossils-I dataset was created to train a custom image classification model f...

Descripción completa

Detalles Bibliográficos
Autor principal: Nathanail, Athanasios
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10293944/
https://www.ncbi.nlm.nih.gov/pubmed/37383796
http://dx.doi.org/10.1016/j.dib.2023.109188
Descripción
Sumario:Geo Fossils-I is a synthetic image dataset used as a solution for resolving the limited availability of geological datasets intended for image classification and object detection on 2D images of geological outcrops. The Geo Fossils-I dataset was created to train a custom image classification model for geological fossil identification and inspire additional work in generating synthetic geological data with Stable Diffusion models. The Geo Fossils-I dataset was generated through a custom training process and the fine-tuning of a pre-trained Stable Diffusion model. Stable Diffusion is an advanced text-to-image model that can create highly realistic images based on textual input. An effective technique for instructing Stable Diffusion on novel concepts is the application of Dreambooth, a specialized form of fine-tuning. Dreambooth was used to generate new images of fossils or to modify existing ones per the provided textual description. The Geo Fossils-I dataset contains six different fossil types present in geological outcrops, each one being characteristic of a particular depositional environment. The dataset contains a total of 1200 fossil images equally spread among different fossil types such as ammonites, belemnites, corals, crinoids, leaf fossils, and trilobites. This dataset is the first set within a series to be compiled aiming to enrich the available resources with respect to 2D outcrop images allowing geoscientists to progress in the field of automated interpretation of depositional environments.