Cargando…

EXSCLAIM!: Harnessing materials science literature for self-labeled microscopy datasets

This work introduces the EXSCLAIM! toolkit for the automatic extraction, separation, and caption-based natural language annotation of images from scientific literature. EXSCLAIM! is used to show how rule-based natural language processing and image recognition can be leveraged to construct an electro...

Descripción completa

Detalles Bibliográficos
Autores principales: Schwenker, Eric, Jiang, Weixin, Spreadbury, Trevor, Ferrier, Nicola, Cossairt, Oliver, Chan, Maria K.Y.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10682750/
https://www.ncbi.nlm.nih.gov/pubmed/38035197
http://dx.doi.org/10.1016/j.patter.2023.100843
_version_ 1785151042150727680
author Schwenker, Eric
Jiang, Weixin
Spreadbury, Trevor
Ferrier, Nicola
Cossairt, Oliver
Chan, Maria K.Y.
author_facet Schwenker, Eric
Jiang, Weixin
Spreadbury, Trevor
Ferrier, Nicola
Cossairt, Oliver
Chan, Maria K.Y.
author_sort Schwenker, Eric
collection PubMed
description This work introduces the EXSCLAIM! toolkit for the automatic extraction, separation, and caption-based natural language annotation of images from scientific literature. EXSCLAIM! is used to show how rule-based natural language processing and image recognition can be leveraged to construct an electron microscopy dataset containing thousands of keyword-annotated nanostructure images. Moreover, it is demonstrated how a combination of statistical topic modeling and semantic word similarity comparisons can be used to increase the number and variety of keyword annotations on top of the standard annotations from EXSCLAIM! With large-scale imaging datasets constructed from scientific literature, users are well positioned to train neural networks for classification and recognition tasks specific to microscopy—tasks often otherwise inhibited by a lack of sufficient annotated training data.
format Online
Article
Text
id pubmed-10682750
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-106827502023-11-30 EXSCLAIM!: Harnessing materials science literature for self-labeled microscopy datasets Schwenker, Eric Jiang, Weixin Spreadbury, Trevor Ferrier, Nicola Cossairt, Oliver Chan, Maria K.Y. Patterns (N Y) Article This work introduces the EXSCLAIM! toolkit for the automatic extraction, separation, and caption-based natural language annotation of images from scientific literature. EXSCLAIM! is used to show how rule-based natural language processing and image recognition can be leveraged to construct an electron microscopy dataset containing thousands of keyword-annotated nanostructure images. Moreover, it is demonstrated how a combination of statistical topic modeling and semantic word similarity comparisons can be used to increase the number and variety of keyword annotations on top of the standard annotations from EXSCLAIM! With large-scale imaging datasets constructed from scientific literature, users are well positioned to train neural networks for classification and recognition tasks specific to microscopy—tasks often otherwise inhibited by a lack of sufficient annotated training data. Elsevier 2023-09-30 /pmc/articles/PMC10682750/ /pubmed/38035197 http://dx.doi.org/10.1016/j.patter.2023.100843 Text en © 2023 The Author(s), Argonne National Laboratory https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Schwenker, Eric
Jiang, Weixin
Spreadbury, Trevor
Ferrier, Nicola
Cossairt, Oliver
Chan, Maria K.Y.
EXSCLAIM!: Harnessing materials science literature for self-labeled microscopy datasets
title EXSCLAIM!: Harnessing materials science literature for self-labeled microscopy datasets
title_full EXSCLAIM!: Harnessing materials science literature for self-labeled microscopy datasets
title_fullStr EXSCLAIM!: Harnessing materials science literature for self-labeled microscopy datasets
title_full_unstemmed EXSCLAIM!: Harnessing materials science literature for self-labeled microscopy datasets
title_short EXSCLAIM!: Harnessing materials science literature for self-labeled microscopy datasets
title_sort exsclaim!: harnessing materials science literature for self-labeled microscopy datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10682750/
https://www.ncbi.nlm.nih.gov/pubmed/38035197
http://dx.doi.org/10.1016/j.patter.2023.100843
work_keys_str_mv AT schwenkereric exsclaimharnessingmaterialsscienceliteratureforselflabeledmicroscopydatasets
AT jiangweixin exsclaimharnessingmaterialsscienceliteratureforselflabeledmicroscopydatasets
AT spreadburytrevor exsclaimharnessingmaterialsscienceliteratureforselflabeledmicroscopydatasets
AT ferriernicola exsclaimharnessingmaterialsscienceliteratureforselflabeledmicroscopydatasets
AT cossairtoliver exsclaimharnessingmaterialsscienceliteratureforselflabeledmicroscopydatasets
AT chanmariaky exsclaimharnessingmaterialsscienceliteratureforselflabeledmicroscopydatasets