Cargando…

Looking through glass: Knowledge discovery from materials science literature using natural language processing

Most of the knowledge in materials science literature is in the form of unstructured data such as text and images. Here, we present a framework employing natural language processing, which automates text and image comprehension and precision knowledge extraction from inorganic glasses’ literature. T...

Descripción completa

Detalles Bibliográficos
Autores principales: Venugopal, Vineeth, Sahoo, Sourav, Zaki, Mohd, Agarwal, Manish, Gosvami, Nitya Nand, Krishnan, N. M. Anoop
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8276010/
https://www.ncbi.nlm.nih.gov/pubmed/34286304
http://dx.doi.org/10.1016/j.patter.2021.100290
_version_ 1783721828645076992
author Venugopal, Vineeth
Sahoo, Sourav
Zaki, Mohd
Agarwal, Manish
Gosvami, Nitya Nand
Krishnan, N. M. Anoop
author_facet Venugopal, Vineeth
Sahoo, Sourav
Zaki, Mohd
Agarwal, Manish
Gosvami, Nitya Nand
Krishnan, N. M. Anoop
author_sort Venugopal, Vineeth
collection PubMed
description Most of the knowledge in materials science literature is in the form of unstructured data such as text and images. Here, we present a framework employing natural language processing, which automates text and image comprehension and precision knowledge extraction from inorganic glasses’ literature. The abstracts are automatically categorized using latent Dirichlet allocation (LDA) to classify and search semantically linked publications. Similarly, a comprehensive summary of images and plots is presented using the caption cluster plot (CCP), providing direct access to images buried in the papers. Finally, we combine the LDA and CCP with chemical elements to present an elemental map, a topical and image-wise distribution of elements occurring in the literature. Overall, the framework presented here can be a generic and powerful tool to extract and disseminate material-specific information on composition–structure–processing–property dataspaces, allowing insights into fundamental problems relevant to the materials science community and accelerated materials discovery.
format Online
Article
Text
id pubmed-8276010
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-82760102021-07-19 Looking through glass: Knowledge discovery from materials science literature using natural language processing Venugopal, Vineeth Sahoo, Sourav Zaki, Mohd Agarwal, Manish Gosvami, Nitya Nand Krishnan, N. M. Anoop Patterns (N Y) Article Most of the knowledge in materials science literature is in the form of unstructured data such as text and images. Here, we present a framework employing natural language processing, which automates text and image comprehension and precision knowledge extraction from inorganic glasses’ literature. The abstracts are automatically categorized using latent Dirichlet allocation (LDA) to classify and search semantically linked publications. Similarly, a comprehensive summary of images and plots is presented using the caption cluster plot (CCP), providing direct access to images buried in the papers. Finally, we combine the LDA and CCP with chemical elements to present an elemental map, a topical and image-wise distribution of elements occurring in the literature. Overall, the framework presented here can be a generic and powerful tool to extract and disseminate material-specific information on composition–structure–processing–property dataspaces, allowing insights into fundamental problems relevant to the materials science community and accelerated materials discovery. Elsevier 2021-06-24 /pmc/articles/PMC8276010/ /pubmed/34286304 http://dx.doi.org/10.1016/j.patter.2021.100290 Text en © 2021 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Venugopal, Vineeth
Sahoo, Sourav
Zaki, Mohd
Agarwal, Manish
Gosvami, Nitya Nand
Krishnan, N. M. Anoop
Looking through glass: Knowledge discovery from materials science literature using natural language processing
title Looking through glass: Knowledge discovery from materials science literature using natural language processing
title_full Looking through glass: Knowledge discovery from materials science literature using natural language processing
title_fullStr Looking through glass: Knowledge discovery from materials science literature using natural language processing
title_full_unstemmed Looking through glass: Knowledge discovery from materials science literature using natural language processing
title_short Looking through glass: Knowledge discovery from materials science literature using natural language processing
title_sort looking through glass: knowledge discovery from materials science literature using natural language processing
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8276010/
https://www.ncbi.nlm.nih.gov/pubmed/34286304
http://dx.doi.org/10.1016/j.patter.2021.100290
work_keys_str_mv AT venugopalvineeth lookingthroughglassknowledgediscoveryfrommaterialsscienceliteratureusingnaturallanguageprocessing
AT sahoosourav lookingthroughglassknowledgediscoveryfrommaterialsscienceliteratureusingnaturallanguageprocessing
AT zakimohd lookingthroughglassknowledgediscoveryfrommaterialsscienceliteratureusingnaturallanguageprocessing
AT agarwalmanish lookingthroughglassknowledgediscoveryfrommaterialsscienceliteratureusingnaturallanguageprocessing
AT gosvaminityanand lookingthroughglassknowledgediscoveryfrommaterialsscienceliteratureusingnaturallanguageprocessing
AT krishnannmanoop lookingthroughglassknowledgediscoveryfrommaterialsscienceliteratureusingnaturallanguageprocessing