Cargando…

Natural language generation for electronic health records

One broad goal of biomedical informatics is to generate fully-synthetic, faithfully representative electronic health records (EHRs) to facilitate data sharing between healthcare providers and researchers and promote methodological research. A variety of methods existing for generating synthetic EHRs...

Descripción completa

Detalles Bibliográficos
Autor principal: Lee, Scott H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6345174/
https://www.ncbi.nlm.nih.gov/pubmed/30687797
http://dx.doi.org/10.1038/s41746-018-0070-0
_version_ 1783389540278337536
author Lee, Scott H.
author_facet Lee, Scott H.
author_sort Lee, Scott H.
collection PubMed
description One broad goal of biomedical informatics is to generate fully-synthetic, faithfully representative electronic health records (EHRs) to facilitate data sharing between healthcare providers and researchers and promote methodological research. A variety of methods existing for generating synthetic EHRs, but they are not capable of generating unstructured text, like emergency department (ED) chief complaints, history of present illness, or progress notes. Here, we use the encoder–decoder model, a deep learning algorithm that features in many contemporary machine translation systems, to generate synthetic chief complaints from discrete variables in EHRs, like age group, gender, and discharge diagnosis. After being trained end-to-end on authentic records, the model can generate realistic chief complaint text that appears to preserve the epidemiological information encoded in the original record-sentence pairs. As a side effect of the model’s optimization goal, these synthetic chief complaints are also free of relatively uncommon abbreviation and misspellings, and they include none of the personally identifiable information (PII) that was in the training data, suggesting that this model may be used to support the de-identification of text in EHRs. When combined with algorithms like generative adversarial networks (GANs), our model could be used to generate fully-synthetic EHRs, allowing healthcare providers to share faithful representations of multimodal medical data without compromising patient privacy. This is an important advance that we hope will facilitate the development of machine-learning methods for clinical decision support, disease surveillance, and other data-hungry applications in biomedical informatics.
format Online
Article
Text
id pubmed-6345174
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-63451742019-01-24 Natural language generation for electronic health records Lee, Scott H. NPJ Digit Med Article One broad goal of biomedical informatics is to generate fully-synthetic, faithfully representative electronic health records (EHRs) to facilitate data sharing between healthcare providers and researchers and promote methodological research. A variety of methods existing for generating synthetic EHRs, but they are not capable of generating unstructured text, like emergency department (ED) chief complaints, history of present illness, or progress notes. Here, we use the encoder–decoder model, a deep learning algorithm that features in many contemporary machine translation systems, to generate synthetic chief complaints from discrete variables in EHRs, like age group, gender, and discharge diagnosis. After being trained end-to-end on authentic records, the model can generate realistic chief complaint text that appears to preserve the epidemiological information encoded in the original record-sentence pairs. As a side effect of the model’s optimization goal, these synthetic chief complaints are also free of relatively uncommon abbreviation and misspellings, and they include none of the personally identifiable information (PII) that was in the training data, suggesting that this model may be used to support the de-identification of text in EHRs. When combined with algorithms like generative adversarial networks (GANs), our model could be used to generate fully-synthetic EHRs, allowing healthcare providers to share faithful representations of multimodal medical data without compromising patient privacy. This is an important advance that we hope will facilitate the development of machine-learning methods for clinical decision support, disease surveillance, and other data-hungry applications in biomedical informatics. Nature Publishing Group UK 2018-11-19 /pmc/articles/PMC6345174/ /pubmed/30687797 http://dx.doi.org/10.1038/s41746-018-0070-0 Text en © This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Lee, Scott H.
Natural language generation for electronic health records
title Natural language generation for electronic health records
title_full Natural language generation for electronic health records
title_fullStr Natural language generation for electronic health records
title_full_unstemmed Natural language generation for electronic health records
title_short Natural language generation for electronic health records
title_sort natural language generation for electronic health records
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6345174/
https://www.ncbi.nlm.nih.gov/pubmed/30687797
http://dx.doi.org/10.1038/s41746-018-0070-0
work_keys_str_mv AT leescotth naturallanguagegenerationforelectronichealthrecords