Cargando…

Generation and evaluation of artificial mental health records for Natural Language Processing

A serious obstacle to the development of Natural Language Processing (NLP) methods in the clinical domain is the accessibility of textual data. The mental health domain is particularly challenging, partly because clinical documentation relies heavily on free text that is difficult to de-identify com...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ive, Julia, Viani, Natalia, Kam, Joyce, Yin, Lucia, Verma, Somain, Puntis, Stephen, Cardinal, Rudolf N., Roberts, Angus, Stewart, Robert, Velupillai, Sumithra
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7224173/ https://www.ncbi.nlm.nih.gov/pubmed/32435697 http://dx.doi.org/10.1038/s41746-020-0267-x

_version_	1783533852218621952
author	Ive, Julia Viani, Natalia Kam, Joyce Yin, Lucia Verma, Somain Puntis, Stephen Cardinal, Rudolf N. Roberts, Angus Stewart, Robert Velupillai, Sumithra
author_facet	Ive, Julia Viani, Natalia Kam, Joyce Yin, Lucia Verma, Somain Puntis, Stephen Cardinal, Rudolf N. Roberts, Angus Stewart, Robert Velupillai, Sumithra
author_sort	Ive, Julia
collection	PubMed
description	A serious obstacle to the development of Natural Language Processing (NLP) methods in the clinical domain is the accessibility of textual data. The mental health domain is particularly challenging, partly because clinical documentation relies heavily on free text that is difficult to de-identify completely. This problem could be tackled by using artificial medical data. In this work, we present an approach to generate artificial clinical documents. We apply this approach to discharge summaries from a large mental healthcare provider and discharge summaries from an intensive care unit. We perform an extensive intrinsic evaluation where we (1) apply several measures of text preservation; (2) measure how much the model memorises training data; and (3) estimate clinical validity of the generated text based on a human evaluation task. Furthermore, we perform an extrinsic evaluation by studying the impact of using artificial text in a downstream NLP text classification task. We found that using this artificial data as training data can lead to classification results that are comparable to the original results. Additionally, using only a small amount of information from the original data to condition the generation of the artificial data is successful, which holds promise for reducing the risk of these artificial data retaining rare information from the original data. This is an important finding for our long-term goal of being able to generate artificial clinical data that can be released to the wider research community and accelerate advances in developing computational methods that use healthcare data.
format	Online Article Text
id	pubmed-7224173
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-72241732020-05-20 Generation and evaluation of artificial mental health records for Natural Language Processing Ive, Julia Viani, Natalia Kam, Joyce Yin, Lucia Verma, Somain Puntis, Stephen Cardinal, Rudolf N. Roberts, Angus Stewart, Robert Velupillai, Sumithra NPJ Digit Med Article A serious obstacle to the development of Natural Language Processing (NLP) methods in the clinical domain is the accessibility of textual data. The mental health domain is particularly challenging, partly because clinical documentation relies heavily on free text that is difficult to de-identify completely. This problem could be tackled by using artificial medical data. In this work, we present an approach to generate artificial clinical documents. We apply this approach to discharge summaries from a large mental healthcare provider and discharge summaries from an intensive care unit. We perform an extensive intrinsic evaluation where we (1) apply several measures of text preservation; (2) measure how much the model memorises training data; and (3) estimate clinical validity of the generated text based on a human evaluation task. Furthermore, we perform an extrinsic evaluation by studying the impact of using artificial text in a downstream NLP text classification task. We found that using this artificial data as training data can lead to classification results that are comparable to the original results. Additionally, using only a small amount of information from the original data to condition the generation of the artificial data is successful, which holds promise for reducing the risk of these artificial data retaining rare information from the original data. This is an important finding for our long-term goal of being able to generate artificial clinical data that can be released to the wider research community and accelerate advances in developing computational methods that use healthcare data. Nature Publishing Group UK 2020-05-14 /pmc/articles/PMC7224173/ /pubmed/32435697 http://dx.doi.org/10.1038/s41746-020-0267-x Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle	Article Ive, Julia Viani, Natalia Kam, Joyce Yin, Lucia Verma, Somain Puntis, Stephen Cardinal, Rudolf N. Roberts, Angus Stewart, Robert Velupillai, Sumithra Generation and evaluation of artificial mental health records for Natural Language Processing
title	Generation and evaluation of artificial mental health records for Natural Language Processing
title_full	Generation and evaluation of artificial mental health records for Natural Language Processing
title_fullStr	Generation and evaluation of artificial mental health records for Natural Language Processing
title_full_unstemmed	Generation and evaluation of artificial mental health records for Natural Language Processing
title_short	Generation and evaluation of artificial mental health records for Natural Language Processing
title_sort	generation and evaluation of artificial mental health records for natural language processing
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7224173/ https://www.ncbi.nlm.nih.gov/pubmed/32435697 http://dx.doi.org/10.1038/s41746-020-0267-x
work_keys_str_mv	AT ivejulia generationandevaluationofartificialmentalhealthrecordsfornaturallanguageprocessing AT vianinatalia generationandevaluationofartificialmentalhealthrecordsfornaturallanguageprocessing AT kamjoyce generationandevaluationofartificialmentalhealthrecordsfornaturallanguageprocessing AT yinlucia generationandevaluationofartificialmentalhealthrecordsfornaturallanguageprocessing AT vermasomain generationandevaluationofartificialmentalhealthrecordsfornaturallanguageprocessing AT puntisstephen generationandevaluationofartificialmentalhealthrecordsfornaturallanguageprocessing AT cardinalrudolfn generationandevaluationofartificialmentalhealthrecordsfornaturallanguageprocessing AT robertsangus generationandevaluationofartificialmentalhealthrecordsfornaturallanguageprocessing AT stewartrobert generationandevaluationofartificialmentalhealthrecordsfornaturallanguageprocessing AT velupillaisumithra generationandevaluationofartificialmentalhealthrecordsfornaturallanguageprocessing

Generation and evaluation of artificial mental health records for Natural Language Processing

Ejemplares similares