Cargando…

Materials information extraction via automatically generated corpus

Information Extraction (IE) in Natural Language Processing (NLP) aims to extract structured information from unstructured text to assist a computer in understanding natural language. Machine learning-based IE methods bring more intelligence and possibilities but require an extensive and accurate lab...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yan, Rongen, Jiang, Xue, Wang, Weiren, Dang, Depeng, Su, Yanjing
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9279422/ https://www.ncbi.nlm.nih.gov/pubmed/35831367 http://dx.doi.org/10.1038/s41597-022-01492-2

_version_	1784746394434666496
author	Yan, Rongen Jiang, Xue Wang, Weiren Dang, Depeng Su, Yanjing
author_facet	Yan, Rongen Jiang, Xue Wang, Weiren Dang, Depeng Su, Yanjing
author_sort	Yan, Rongen
collection	PubMed
description	Information Extraction (IE) in Natural Language Processing (NLP) aims to extract structured information from unstructured text to assist a computer in understanding natural language. Machine learning-based IE methods bring more intelligence and possibilities but require an extensive and accurate labeled corpus. In the materials science domain, giving reliable labels is a laborious task that requires the efforts of many professionals. To reduce manual intervention and automatically generate materials corpus during IE, in this work, we propose a semi-supervised IE framework for materials via automatically generated corpus. Taking the superalloy data extraction in our previous work as an example, the proposed framework using Snorkel automatically labels the corpus containing property values. Then Ordered Neurons-Long Short-Term Memory (ON-LSTM) network is adopted to train an information extraction model on the generated corpus. The experimental results show that the F1-score of γ’ solvus temperature, density and solidus temperature of superalloys are 83.90%, 94.02%, 89.27%, respectively. Furthermore, we conduct similar experiments on other materials, the experimental results show that the proposed framework is universal in the field of materials.
format	Online Article Text
id	pubmed-9279422
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-92794222022-07-15 Materials information extraction via automatically generated corpus Yan, Rongen Jiang, Xue Wang, Weiren Dang, Depeng Su, Yanjing Sci Data Article Information Extraction (IE) in Natural Language Processing (NLP) aims to extract structured information from unstructured text to assist a computer in understanding natural language. Machine learning-based IE methods bring more intelligence and possibilities but require an extensive and accurate labeled corpus. In the materials science domain, giving reliable labels is a laborious task that requires the efforts of many professionals. To reduce manual intervention and automatically generate materials corpus during IE, in this work, we propose a semi-supervised IE framework for materials via automatically generated corpus. Taking the superalloy data extraction in our previous work as an example, the proposed framework using Snorkel automatically labels the corpus containing property values. Then Ordered Neurons-Long Short-Term Memory (ON-LSTM) network is adopted to train an information extraction model on the generated corpus. The experimental results show that the F1-score of γ’ solvus temperature, density and solidus temperature of superalloys are 83.90%, 94.02%, 89.27%, respectively. Furthermore, we conduct similar experiments on other materials, the experimental results show that the proposed framework is universal in the field of materials. Nature Publishing Group UK 2022-07-13 /pmc/articles/PMC9279422/ /pubmed/35831367 http://dx.doi.org/10.1038/s41597-022-01492-2 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Yan, Rongen Jiang, Xue Wang, Weiren Dang, Depeng Su, Yanjing Materials information extraction via automatically generated corpus
title	Materials information extraction via automatically generated corpus
title_full	Materials information extraction via automatically generated corpus
title_fullStr	Materials information extraction via automatically generated corpus
title_full_unstemmed	Materials information extraction via automatically generated corpus
title_short	Materials information extraction via automatically generated corpus
title_sort	materials information extraction via automatically generated corpus
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9279422/ https://www.ncbi.nlm.nih.gov/pubmed/35831367 http://dx.doi.org/10.1038/s41597-022-01492-2
work_keys_str_mv	AT yanrongen materialsinformationextractionviaautomaticallygeneratedcorpus AT jiangxue materialsinformationextractionviaautomaticallygeneratedcorpus AT wangweiren materialsinformationextractionviaautomaticallygeneratedcorpus AT dangdepeng materialsinformationextractionviaautomaticallygeneratedcorpus AT suyanjing materialsinformationextractionviaautomaticallygeneratedcorpus

Materials information extraction via automatically generated corpus

Ejemplares similares