Cargando…
OpticalBERT and OpticalTable-SQA: Text- and Table-Based Language Models for the Optical-Materials Domain
[Image: see text] Text mining in the optical-materials domain is becoming increasingly important as the number of scientific publications in this area grows rapidly. Language models such as Bidirectional Encoder Representations from Transformers (BERT) have opened up a new era and brought a signific...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10091421/ https://www.ncbi.nlm.nih.gov/pubmed/36940385 http://dx.doi.org/10.1021/acs.jcim.2c01259 |
_version_ | 1785023131338932224 |
---|---|
author | Zhao, Jiuyang Huang, Shu Cole, Jacqueline M. |
author_facet | Zhao, Jiuyang Huang, Shu Cole, Jacqueline M. |
author_sort | Zhao, Jiuyang |
collection | PubMed |
description | [Image: see text] Text mining in the optical-materials domain is becoming increasingly important as the number of scientific publications in this area grows rapidly. Language models such as Bidirectional Encoder Representations from Transformers (BERT) have opened up a new era and brought a significant boost to state-of-the-art natural-language-processing (NLP) tasks. In this paper, we present two “materials-aware” text-based language models for optical research, OpticalBERT and OpticalPureBERT, which are trained on a large corpus of scientific literature in the optical-materials domain. These two models outperform BERT and previous state-of-the-art models in a variety of text-mining tasks about optical materials. We also release the first “materials-aware” table-based language model, OpticalTable-SQA. This is a querying facility that solicits answers to questions about optical materials using tabular information that pertains to this scientific domain. The OpticalTable-SQA model was realized by fine-tuning the Tapas-SQA model using a manually annotated OpticalTableQA data set which was curated specifically for this work. While preserving its sequential question-answering performance on general tables, the OpticalTable-SQA model significantly outperforms Tapas-SQA on optical-materials-related tables. All models and data sets are available to the optical-materials-science community. |
format | Online Article Text |
id | pubmed-10091421 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-100914212023-04-13 OpticalBERT and OpticalTable-SQA: Text- and Table-Based Language Models for the Optical-Materials Domain Zhao, Jiuyang Huang, Shu Cole, Jacqueline M. J Chem Inf Model [Image: see text] Text mining in the optical-materials domain is becoming increasingly important as the number of scientific publications in this area grows rapidly. Language models such as Bidirectional Encoder Representations from Transformers (BERT) have opened up a new era and brought a significant boost to state-of-the-art natural-language-processing (NLP) tasks. In this paper, we present two “materials-aware” text-based language models for optical research, OpticalBERT and OpticalPureBERT, which are trained on a large corpus of scientific literature in the optical-materials domain. These two models outperform BERT and previous state-of-the-art models in a variety of text-mining tasks about optical materials. We also release the first “materials-aware” table-based language model, OpticalTable-SQA. This is a querying facility that solicits answers to questions about optical materials using tabular information that pertains to this scientific domain. The OpticalTable-SQA model was realized by fine-tuning the Tapas-SQA model using a manually annotated OpticalTableQA data set which was curated specifically for this work. While preserving its sequential question-answering performance on general tables, the OpticalTable-SQA model significantly outperforms Tapas-SQA on optical-materials-related tables. All models and data sets are available to the optical-materials-science community. American Chemical Society 2023-03-20 /pmc/articles/PMC10091421/ /pubmed/36940385 http://dx.doi.org/10.1021/acs.jcim.2c01259 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Zhao, Jiuyang Huang, Shu Cole, Jacqueline M. OpticalBERT and OpticalTable-SQA: Text- and Table-Based Language Models for the Optical-Materials Domain |
title | OpticalBERT and
OpticalTable-SQA: Text- and Table-Based
Language Models for the Optical-Materials Domain |
title_full | OpticalBERT and
OpticalTable-SQA: Text- and Table-Based
Language Models for the Optical-Materials Domain |
title_fullStr | OpticalBERT and
OpticalTable-SQA: Text- and Table-Based
Language Models for the Optical-Materials Domain |
title_full_unstemmed | OpticalBERT and
OpticalTable-SQA: Text- and Table-Based
Language Models for the Optical-Materials Domain |
title_short | OpticalBERT and
OpticalTable-SQA: Text- and Table-Based
Language Models for the Optical-Materials Domain |
title_sort | opticalbert and
opticaltable-sqa: text- and table-based
language models for the optical-materials domain |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10091421/ https://www.ncbi.nlm.nih.gov/pubmed/36940385 http://dx.doi.org/10.1021/acs.jcim.2c01259 |
work_keys_str_mv | AT zhaojiuyang opticalbertandopticaltablesqatextandtablebasedlanguagemodelsfortheopticalmaterialsdomain AT huangshu opticalbertandopticaltablesqatextandtablebasedlanguagemodelsfortheopticalmaterialsdomain AT colejacquelinem opticalbertandopticaltablesqatextandtablebasedlanguagemodelsfortheopticalmaterialsdomain |