Cargando…

Information extraction from free text for aiding transdiagnostic psychiatry: constructing NLP pipelines tailored to clinicians’ needs

BACKGROUND: Developing predictive models for precision psychiatry is challenging because of unavailability of the necessary data: extracting useful information from existing electronic health record (EHR) data is not straightforward, and available clinical trial datasets are often not representative...

Descripción completa

Detalles Bibliográficos
Autores principales: Turner, Rosanne J., Coenen, Femke, Roelofs, Femke, Hagoort, Karin, Härmä, Aki, Grünwald, Peter D., Velders, Fleur P., Scheepers, Floortje E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9206307/
https://www.ncbi.nlm.nih.gov/pubmed/35715745
http://dx.doi.org/10.1186/s12888-022-04058-z
_version_ 1784729310654889984
author Turner, Rosanne J.
Coenen, Femke
Roelofs, Femke
Hagoort, Karin
Härmä, Aki
Grünwald, Peter D.
Velders, Fleur P.
Scheepers, Floortje E.
author_facet Turner, Rosanne J.
Coenen, Femke
Roelofs, Femke
Hagoort, Karin
Härmä, Aki
Grünwald, Peter D.
Velders, Fleur P.
Scheepers, Floortje E.
author_sort Turner, Rosanne J.
collection PubMed
description BACKGROUND: Developing predictive models for precision psychiatry is challenging because of unavailability of the necessary data: extracting useful information from existing electronic health record (EHR) data is not straightforward, and available clinical trial datasets are often not representative for heterogeneous patient groups. The aim of this study was constructing a natural language processing (NLP) pipeline that extracts variables for building predictive models from EHRs. We specifically tailor the pipeline for extracting information on outcomes of psychiatry treatment trajectories, applicable throughout the entire spectrum of mental health disorders (“transdiagnostic”). METHODS: A qualitative study into beliefs of clinical staff on measuring treatment outcomes was conducted to construct a candidate list of variables to extract from the EHR. To investigate if the proposed variables are suitable for measuring treatment effects, resulting themes were compared to transdiagnostic outcome measures currently used in psychiatry research and compared to the HDRS (as a gold standard) through systematic review, resulting in an ideal set of variables. To extract these from EHR data, a semi-rule based NLP pipeline was constructed and tailored to the candidate variables using Prodigy. Classification accuracy and F1-scores were calculated and pipeline output was compared to HDRS scores using clinical notes from patients admitted in 2019 and 2020. RESULTS: Analysis of 34 questionnaires answered by clinical staff resulted in four themes defining treatment outcomes: symptom reduction, general well-being, social functioning and personalization. Systematic review revealed 242 different transdiagnostic outcome measures, with the 36-item Short-Form Survey for quality of life (SF36) being used most consistently, showing substantial overlap with the themes from the qualitative study. Comparing SF36 to HDRS scores in 26 studies revealed moderate to good correlations (0.62—0.79) and good positive predictive values (0.75—0.88). The NLP pipeline developed with notes from 22,170 patients reached an accuracy of 95 to 99 percent (F1 scores: 0.38 – 0.86) on detecting these themes, evaluated on data from 361 patients. CONCLUSIONS: The NLP pipeline developed in this study extracts outcome measures from the EHR that cater specifically to the needs of clinical staff and align with outcome measures used to detect treatment effects in clinical trials. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12888-022-04058-z.
format Online
Article
Text
id pubmed-9206307
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-92063072022-06-19 Information extraction from free text for aiding transdiagnostic psychiatry: constructing NLP pipelines tailored to clinicians’ needs Turner, Rosanne J. Coenen, Femke Roelofs, Femke Hagoort, Karin Härmä, Aki Grünwald, Peter D. Velders, Fleur P. Scheepers, Floortje E. BMC Psychiatry Research BACKGROUND: Developing predictive models for precision psychiatry is challenging because of unavailability of the necessary data: extracting useful information from existing electronic health record (EHR) data is not straightforward, and available clinical trial datasets are often not representative for heterogeneous patient groups. The aim of this study was constructing a natural language processing (NLP) pipeline that extracts variables for building predictive models from EHRs. We specifically tailor the pipeline for extracting information on outcomes of psychiatry treatment trajectories, applicable throughout the entire spectrum of mental health disorders (“transdiagnostic”). METHODS: A qualitative study into beliefs of clinical staff on measuring treatment outcomes was conducted to construct a candidate list of variables to extract from the EHR. To investigate if the proposed variables are suitable for measuring treatment effects, resulting themes were compared to transdiagnostic outcome measures currently used in psychiatry research and compared to the HDRS (as a gold standard) through systematic review, resulting in an ideal set of variables. To extract these from EHR data, a semi-rule based NLP pipeline was constructed and tailored to the candidate variables using Prodigy. Classification accuracy and F1-scores were calculated and pipeline output was compared to HDRS scores using clinical notes from patients admitted in 2019 and 2020. RESULTS: Analysis of 34 questionnaires answered by clinical staff resulted in four themes defining treatment outcomes: symptom reduction, general well-being, social functioning and personalization. Systematic review revealed 242 different transdiagnostic outcome measures, with the 36-item Short-Form Survey for quality of life (SF36) being used most consistently, showing substantial overlap with the themes from the qualitative study. Comparing SF36 to HDRS scores in 26 studies revealed moderate to good correlations (0.62—0.79) and good positive predictive values (0.75—0.88). The NLP pipeline developed with notes from 22,170 patients reached an accuracy of 95 to 99 percent (F1 scores: 0.38 – 0.86) on detecting these themes, evaluated on data from 361 patients. CONCLUSIONS: The NLP pipeline developed in this study extracts outcome measures from the EHR that cater specifically to the needs of clinical staff and align with outcome measures used to detect treatment effects in clinical trials. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12888-022-04058-z. BioMed Central 2022-06-17 /pmc/articles/PMC9206307/ /pubmed/35715745 http://dx.doi.org/10.1186/s12888-022-04058-z Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Turner, Rosanne J.
Coenen, Femke
Roelofs, Femke
Hagoort, Karin
Härmä, Aki
Grünwald, Peter D.
Velders, Fleur P.
Scheepers, Floortje E.
Information extraction from free text for aiding transdiagnostic psychiatry: constructing NLP pipelines tailored to clinicians’ needs
title Information extraction from free text for aiding transdiagnostic psychiatry: constructing NLP pipelines tailored to clinicians’ needs
title_full Information extraction from free text for aiding transdiagnostic psychiatry: constructing NLP pipelines tailored to clinicians’ needs
title_fullStr Information extraction from free text for aiding transdiagnostic psychiatry: constructing NLP pipelines tailored to clinicians’ needs
title_full_unstemmed Information extraction from free text for aiding transdiagnostic psychiatry: constructing NLP pipelines tailored to clinicians’ needs
title_short Information extraction from free text for aiding transdiagnostic psychiatry: constructing NLP pipelines tailored to clinicians’ needs
title_sort information extraction from free text for aiding transdiagnostic psychiatry: constructing nlp pipelines tailored to clinicians’ needs
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9206307/
https://www.ncbi.nlm.nih.gov/pubmed/35715745
http://dx.doi.org/10.1186/s12888-022-04058-z
work_keys_str_mv AT turnerrosannej informationextractionfromfreetextforaidingtransdiagnosticpsychiatryconstructingnlppipelinestailoredtocliniciansneeds
AT coenenfemke informationextractionfromfreetextforaidingtransdiagnosticpsychiatryconstructingnlppipelinestailoredtocliniciansneeds
AT roelofsfemke informationextractionfromfreetextforaidingtransdiagnosticpsychiatryconstructingnlppipelinestailoredtocliniciansneeds
AT hagoortkarin informationextractionfromfreetextforaidingtransdiagnosticpsychiatryconstructingnlppipelinestailoredtocliniciansneeds
AT harmaaki informationextractionfromfreetextforaidingtransdiagnosticpsychiatryconstructingnlppipelinestailoredtocliniciansneeds
AT grunwaldpeterd informationextractionfromfreetextforaidingtransdiagnosticpsychiatryconstructingnlppipelinestailoredtocliniciansneeds
AT veldersfleurp informationextractionfromfreetextforaidingtransdiagnosticpsychiatryconstructingnlppipelinestailoredtocliniciansneeds
AT scheepersfloortjee informationextractionfromfreetextforaidingtransdiagnosticpsychiatryconstructingnlppipelinestailoredtocliniciansneeds