Cargando…

Combining deep learning with token selection for patient phenotyping from electronic health records

Artificial intelligence provides the opportunity to reveal important information buried in large amounts of complex data. Electronic health records (eHRs) are a source of such big data that provide a multitude of health related clinical information about patients. However, text data from eHRs, e.g.,...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yang, Zhen, Dehmer, Matthias, Yli-Harja, Olli, Emmert-Streib, Frank
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6989657/ https://www.ncbi.nlm.nih.gov/pubmed/31996705 http://dx.doi.org/10.1038/s41598-020-58178-1

_version_	1783492448707674112
author	Yang, Zhen Dehmer, Matthias Yli-Harja, Olli Emmert-Streib, Frank
author_facet	Yang, Zhen Dehmer, Matthias Yli-Harja, Olli Emmert-Streib, Frank
author_sort	Yang, Zhen
collection	PubMed
description	Artificial intelligence provides the opportunity to reveal important information buried in large amounts of complex data. Electronic health records (eHRs) are a source of such big data that provide a multitude of health related clinical information about patients. However, text data from eHRs, e.g., discharge summary notes, are challenging in their analysis because these notes are free-form texts and the writing formats and styles vary considerably between different records. For this reason, in this paper we study deep learning neural networks in combination with natural language processing to analyze text data from clinical discharge summaries. We provide a detail analysis of patient phenotyping, i.e., the automatic prediction of ten patient disorders, by investigating the influence of network architectures, sample sizes and information content of tokens. Importantly, for patients suffering from Chronic Pain, the disorder that is the most difficult one to classify, we find the largest performance gain for a combined word- and sentence-level input convolutional neural network (ws-CNN). As a general result, we find that the combination of data quality and data quantity of the text data is playing a crucial role for using more complex network architectures that improve significantly beyond a word-level input CNN model. From our investigations of learning curves and token selection mechanisms, we conclude that for such a transition one requires larger sample sizes because the amount of information per sample is quite small and only carried by few tokens and token categories. Interestingly, we found that the token frequency in the eHRs follow a Zipf law and we utilized this behavior to investigate the information content of tokens by defining a token selection mechanism. The latter addresses also issues of explainable AI.
format	Online Article Text
id	pubmed-6989657
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-69896572020-02-05 Combining deep learning with token selection for patient phenotyping from electronic health records Yang, Zhen Dehmer, Matthias Yli-Harja, Olli Emmert-Streib, Frank Sci Rep Article Artificial intelligence provides the opportunity to reveal important information buried in large amounts of complex data. Electronic health records (eHRs) are a source of such big data that provide a multitude of health related clinical information about patients. However, text data from eHRs, e.g., discharge summary notes, are challenging in their analysis because these notes are free-form texts and the writing formats and styles vary considerably between different records. For this reason, in this paper we study deep learning neural networks in combination with natural language processing to analyze text data from clinical discharge summaries. We provide a detail analysis of patient phenotyping, i.e., the automatic prediction of ten patient disorders, by investigating the influence of network architectures, sample sizes and information content of tokens. Importantly, for patients suffering from Chronic Pain, the disorder that is the most difficult one to classify, we find the largest performance gain for a combined word- and sentence-level input convolutional neural network (ws-CNN). As a general result, we find that the combination of data quality and data quantity of the text data is playing a crucial role for using more complex network architectures that improve significantly beyond a word-level input CNN model. From our investigations of learning curves and token selection mechanisms, we conclude that for such a transition one requires larger sample sizes because the amount of information per sample is quite small and only carried by few tokens and token categories. Interestingly, we found that the token frequency in the eHRs follow a Zipf law and we utilized this behavior to investigate the information content of tokens by defining a token selection mechanism. The latter addresses also issues of explainable AI. Nature Publishing Group UK 2020-01-29 /pmc/articles/PMC6989657/ /pubmed/31996705 http://dx.doi.org/10.1038/s41598-020-58178-1 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle	Article Yang, Zhen Dehmer, Matthias Yli-Harja, Olli Emmert-Streib, Frank Combining deep learning with token selection for patient phenotyping from electronic health records
title	Combining deep learning with token selection for patient phenotyping from electronic health records
title_full	Combining deep learning with token selection for patient phenotyping from electronic health records
title_fullStr	Combining deep learning with token selection for patient phenotyping from electronic health records
title_full_unstemmed	Combining deep learning with token selection for patient phenotyping from electronic health records
title_short	Combining deep learning with token selection for patient phenotyping from electronic health records
title_sort	combining deep learning with token selection for patient phenotyping from electronic health records
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6989657/ https://www.ncbi.nlm.nih.gov/pubmed/31996705 http://dx.doi.org/10.1038/s41598-020-58178-1
work_keys_str_mv	AT yangzhen combiningdeeplearningwithtokenselectionforpatientphenotypingfromelectronichealthrecords AT dehmermatthias combiningdeeplearningwithtokenselectionforpatientphenotypingfromelectronichealthrecords AT yliharjaolli combiningdeeplearningwithtokenselectionforpatientphenotypingfromelectronichealthrecords AT emmertstreibfrank combiningdeeplearningwithtokenselectionforpatientphenotypingfromelectronichealthrecords

Combining deep learning with token selection for patient phenotyping from electronic health records

Ejemplares similares