Cargando…

A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation

BACKGROUND: Electronic medical records (EMRs) are usually stored in relational databases that require SQL queries to retrieve information of interest. Effectively completing such queries can be a challenging task for medical experts due to the barriers in expertise. Existing text-to-SQL generation s...

Descripción completa

Detalles Bibliográficos
Autores principales: Pan, Youcheng, Wang, Chenghao, Hu, Baotian, Xiang, Yang, Wang, Xiaolong, Chen, Qingcai, Chen, Junjie, Du, Jingcheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8701710/
https://www.ncbi.nlm.nih.gov/pubmed/34889749
http://dx.doi.org/10.2196/32698
_version_ 1784621068712935424
author Pan, Youcheng
Wang, Chenghao
Hu, Baotian
Xiang, Yang
Wang, Xiaolong
Chen, Qingcai
Chen, Junjie
Du, Jingcheng
author_facet Pan, Youcheng
Wang, Chenghao
Hu, Baotian
Xiang, Yang
Wang, Xiaolong
Chen, Qingcai
Chen, Junjie
Du, Jingcheng
author_sort Pan, Youcheng
collection PubMed
description BACKGROUND: Electronic medical records (EMRs) are usually stored in relational databases that require SQL queries to retrieve information of interest. Effectively completing such queries can be a challenging task for medical experts due to the barriers in expertise. Existing text-to-SQL generation studies have not been fully embraced in the medical domain. OBJECTIVE: The objective of this study was to propose a neural generation model that can jointly consider the characteristics of medical text and the SQL structure to automatically transform medical texts to SQL queries for EMRs. METHODS: We proposed a medical text–to-SQL model (MedTS), which employed a pretrained Bidirectional Encoder Representations From Transformers model as the encoder and leveraged a grammar-based long short-term memory network as the decoder to predict the intermediate representation that can easily be transformed into the final SQL query. We adopted the syntax tree as the intermediate representation rather than directly regarding the SQL query as an ordinary word sequence, which is more in line with the tree-structure nature of SQL and can also effectively reduce the search space during generation. Experiments were conducted on the MIMICSQL dataset, and 5 competitor methods were compared. RESULTS: Experimental results demonstrated that MedTS achieved the accuracy of 0.784 and 0.899 on the test set in terms of logic form and execution, respectively, which significantly outperformed the existing state-of-the-art methods. Further analyses proved that the performance on each component of the generated SQL was relatively balanced and offered substantial improvements. CONCLUSIONS: The proposed MedTS was effective and robust for improving the performance of medical text–to-SQL generation, indicating strong potential to be applied in the real medical scenario.
format Online
Article
Text
id pubmed-8701710
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-87017102022-01-10 A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation Pan, Youcheng Wang, Chenghao Hu, Baotian Xiang, Yang Wang, Xiaolong Chen, Qingcai Chen, Junjie Du, Jingcheng JMIR Med Inform Original Paper BACKGROUND: Electronic medical records (EMRs) are usually stored in relational databases that require SQL queries to retrieve information of interest. Effectively completing such queries can be a challenging task for medical experts due to the barriers in expertise. Existing text-to-SQL generation studies have not been fully embraced in the medical domain. OBJECTIVE: The objective of this study was to propose a neural generation model that can jointly consider the characteristics of medical text and the SQL structure to automatically transform medical texts to SQL queries for EMRs. METHODS: We proposed a medical text–to-SQL model (MedTS), which employed a pretrained Bidirectional Encoder Representations From Transformers model as the encoder and leveraged a grammar-based long short-term memory network as the decoder to predict the intermediate representation that can easily be transformed into the final SQL query. We adopted the syntax tree as the intermediate representation rather than directly regarding the SQL query as an ordinary word sequence, which is more in line with the tree-structure nature of SQL and can also effectively reduce the search space during generation. Experiments were conducted on the MIMICSQL dataset, and 5 competitor methods were compared. RESULTS: Experimental results demonstrated that MedTS achieved the accuracy of 0.784 and 0.899 on the test set in terms of logic form and execution, respectively, which significantly outperformed the existing state-of-the-art methods. Further analyses proved that the performance on each component of the generated SQL was relatively balanced and offered substantial improvements. CONCLUSIONS: The proposed MedTS was effective and robust for improving the performance of medical text–to-SQL generation, indicating strong potential to be applied in the real medical scenario. JMIR Publications 2021-12-08 /pmc/articles/PMC8701710/ /pubmed/34889749 http://dx.doi.org/10.2196/32698 Text en ©Youcheng Pan, Chenghao Wang, Baotian Hu, Yang Xiang, Xiaolong Wang, Qingcai Chen, Junjie Chen, Jingcheng Du. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 08.12.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Pan, Youcheng
Wang, Chenghao
Hu, Baotian
Xiang, Yang
Wang, Xiaolong
Chen, Qingcai
Chen, Junjie
Du, Jingcheng
A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation
title A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation
title_full A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation
title_fullStr A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation
title_full_unstemmed A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation
title_short A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation
title_sort bert-based generation model to transform medical texts to sql queries for electronic medical records: model development and validation
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8701710/
https://www.ncbi.nlm.nih.gov/pubmed/34889749
http://dx.doi.org/10.2196/32698
work_keys_str_mv AT panyoucheng abertbasedgenerationmodeltotransformmedicaltextstosqlqueriesforelectronicmedicalrecordsmodeldevelopmentandvalidation
AT wangchenghao abertbasedgenerationmodeltotransformmedicaltextstosqlqueriesforelectronicmedicalrecordsmodeldevelopmentandvalidation
AT hubaotian abertbasedgenerationmodeltotransformmedicaltextstosqlqueriesforelectronicmedicalrecordsmodeldevelopmentandvalidation
AT xiangyang abertbasedgenerationmodeltotransformmedicaltextstosqlqueriesforelectronicmedicalrecordsmodeldevelopmentandvalidation
AT wangxiaolong abertbasedgenerationmodeltotransformmedicaltextstosqlqueriesforelectronicmedicalrecordsmodeldevelopmentandvalidation
AT chenqingcai abertbasedgenerationmodeltotransformmedicaltextstosqlqueriesforelectronicmedicalrecordsmodeldevelopmentandvalidation
AT chenjunjie abertbasedgenerationmodeltotransformmedicaltextstosqlqueriesforelectronicmedicalrecordsmodeldevelopmentandvalidation
AT dujingcheng abertbasedgenerationmodeltotransformmedicaltextstosqlqueriesforelectronicmedicalrecordsmodeldevelopmentandvalidation
AT panyoucheng bertbasedgenerationmodeltotransformmedicaltextstosqlqueriesforelectronicmedicalrecordsmodeldevelopmentandvalidation
AT wangchenghao bertbasedgenerationmodeltotransformmedicaltextstosqlqueriesforelectronicmedicalrecordsmodeldevelopmentandvalidation
AT hubaotian bertbasedgenerationmodeltotransformmedicaltextstosqlqueriesforelectronicmedicalrecordsmodeldevelopmentandvalidation
AT xiangyang bertbasedgenerationmodeltotransformmedicaltextstosqlqueriesforelectronicmedicalrecordsmodeldevelopmentandvalidation
AT wangxiaolong bertbasedgenerationmodeltotransformmedicaltextstosqlqueriesforelectronicmedicalrecordsmodeldevelopmentandvalidation
AT chenqingcai bertbasedgenerationmodeltotransformmedicaltextstosqlqueriesforelectronicmedicalrecordsmodeldevelopmentandvalidation
AT chenjunjie bertbasedgenerationmodeltotransformmedicaltextstosqlqueriesforelectronicmedicalrecordsmodeldevelopmentandvalidation
AT dujingcheng bertbasedgenerationmodeltotransformmedicaltextstosqlqueriesforelectronicmedicalrecordsmodeldevelopmentandvalidation