Cargando…
EXSCLAIM!: Harnessing materials science literature for self-labeled microscopy datasets
This work introduces the EXSCLAIM! toolkit for the automatic extraction, separation, and caption-based natural language annotation of images from scientific literature. EXSCLAIM! is used to show how rule-based natural language processing and image recognition can be leveraged to construct an electro...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10682750/ https://www.ncbi.nlm.nih.gov/pubmed/38035197 http://dx.doi.org/10.1016/j.patter.2023.100843 |
_version_ | 1785151042150727680 |
---|---|
author | Schwenker, Eric Jiang, Weixin Spreadbury, Trevor Ferrier, Nicola Cossairt, Oliver Chan, Maria K.Y. |
author_facet | Schwenker, Eric Jiang, Weixin Spreadbury, Trevor Ferrier, Nicola Cossairt, Oliver Chan, Maria K.Y. |
author_sort | Schwenker, Eric |
collection | PubMed |
description | This work introduces the EXSCLAIM! toolkit for the automatic extraction, separation, and caption-based natural language annotation of images from scientific literature. EXSCLAIM! is used to show how rule-based natural language processing and image recognition can be leveraged to construct an electron microscopy dataset containing thousands of keyword-annotated nanostructure images. Moreover, it is demonstrated how a combination of statistical topic modeling and semantic word similarity comparisons can be used to increase the number and variety of keyword annotations on top of the standard annotations from EXSCLAIM! With large-scale imaging datasets constructed from scientific literature, users are well positioned to train neural networks for classification and recognition tasks specific to microscopy—tasks often otherwise inhibited by a lack of sufficient annotated training data. |
format | Online Article Text |
id | pubmed-10682750 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-106827502023-11-30 EXSCLAIM!: Harnessing materials science literature for self-labeled microscopy datasets Schwenker, Eric Jiang, Weixin Spreadbury, Trevor Ferrier, Nicola Cossairt, Oliver Chan, Maria K.Y. Patterns (N Y) Article This work introduces the EXSCLAIM! toolkit for the automatic extraction, separation, and caption-based natural language annotation of images from scientific literature. EXSCLAIM! is used to show how rule-based natural language processing and image recognition can be leveraged to construct an electron microscopy dataset containing thousands of keyword-annotated nanostructure images. Moreover, it is demonstrated how a combination of statistical topic modeling and semantic word similarity comparisons can be used to increase the number and variety of keyword annotations on top of the standard annotations from EXSCLAIM! With large-scale imaging datasets constructed from scientific literature, users are well positioned to train neural networks for classification and recognition tasks specific to microscopy—tasks often otherwise inhibited by a lack of sufficient annotated training data. Elsevier 2023-09-30 /pmc/articles/PMC10682750/ /pubmed/38035197 http://dx.doi.org/10.1016/j.patter.2023.100843 Text en © 2023 The Author(s), Argonne National Laboratory https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Schwenker, Eric Jiang, Weixin Spreadbury, Trevor Ferrier, Nicola Cossairt, Oliver Chan, Maria K.Y. EXSCLAIM!: Harnessing materials science literature for self-labeled microscopy datasets |
title | EXSCLAIM!: Harnessing materials science literature for self-labeled microscopy datasets |
title_full | EXSCLAIM!: Harnessing materials science literature for self-labeled microscopy datasets |
title_fullStr | EXSCLAIM!: Harnessing materials science literature for self-labeled microscopy datasets |
title_full_unstemmed | EXSCLAIM!: Harnessing materials science literature for self-labeled microscopy datasets |
title_short | EXSCLAIM!: Harnessing materials science literature for self-labeled microscopy datasets |
title_sort | exsclaim!: harnessing materials science literature for self-labeled microscopy datasets |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10682750/ https://www.ncbi.nlm.nih.gov/pubmed/38035197 http://dx.doi.org/10.1016/j.patter.2023.100843 |
work_keys_str_mv | AT schwenkereric exsclaimharnessingmaterialsscienceliteratureforselflabeledmicroscopydatasets AT jiangweixin exsclaimharnessingmaterialsscienceliteratureforselflabeledmicroscopydatasets AT spreadburytrevor exsclaimharnessingmaterialsscienceliteratureforselflabeledmicroscopydatasets AT ferriernicola exsclaimharnessingmaterialsscienceliteratureforselflabeledmicroscopydatasets AT cossairtoliver exsclaimharnessingmaterialsscienceliteratureforselflabeledmicroscopydatasets AT chanmariaky exsclaimharnessingmaterialsscienceliteratureforselflabeledmicroscopydatasets |