Cargando…

Annotation of Trauma-related Linguistic Features in Psychiatric Electronic Health Records for Machine Learning Applications

Psychiatric electronic health records (EHRs) present a distinctive challenge in the domain of ML owing to their unstructured nature, with a high degree of complexity and variability. This study aimed to identify a cohort of patients with diagnoses of a psychotic disorder and posttraumatic stress dis...

Descripción completa

Detalles Bibliográficos
Autores principales: Holderness, Eben, Atwood, Bruce, Verhagen, Marc, Shinn, Ann, Cawkwell, Philip, Pustejovsky, James, Hall, Mei-Hua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Journal Experts 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10081360/
https://www.ncbi.nlm.nih.gov/pubmed/37034796
http://dx.doi.org/10.21203/rs.3.rs-2711718/v1
_version_ 1785021105902190592
author Holderness, Eben
Atwood, Bruce
Verhagen, Marc
Shinn, Ann
Cawkwell, Philip
Pustejovsky, James
Hall, Mei-Hua
author_facet Holderness, Eben
Atwood, Bruce
Verhagen, Marc
Shinn, Ann
Cawkwell, Philip
Pustejovsky, James
Hall, Mei-Hua
author_sort Holderness, Eben
collection PubMed
description Psychiatric electronic health records (EHRs) present a distinctive challenge in the domain of ML owing to their unstructured nature, with a high degree of complexity and variability. This study aimed to identify a cohort of patients with diagnoses of a psychotic disorder and posttraumatic stress disorder (PTSD), develop clinically-informed guidelines for annotating these health records for instances of traumatic events to create a gold standard publicly available dataset, and demonstrate that the data gathered using this annotation scheme is suitable for training a machine learning (ML) model to identify these indicators of trauma in unseen health records. We created a representative corpus of 101 EHRs (222,033 tokens) from a centralized database and a detailed annotation scheme for annotating information relevant to traumatic events in the clinical narratives. A team of clinical experts annotated the dataset and updated the annotation guidelines in collaboration with computational linguistic specialists. Inter-annotator agreement was high (0.688 for span tags, 0.589 for relations, and 0.874 for tag attributes). We characterize the major points relating to the annotation process of psychiatric EHRs. Additionally, high-performing baseline span labeling and relation extraction ML models were developed to demonstrate practical viability of the gold standard corpus for ML applications.
format Online
Article
Text
id pubmed-10081360
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Journal Experts
record_format MEDLINE/PubMed
spelling pubmed-100813602023-04-08 Annotation of Trauma-related Linguistic Features in Psychiatric Electronic Health Records for Machine Learning Applications Holderness, Eben Atwood, Bruce Verhagen, Marc Shinn, Ann Cawkwell, Philip Pustejovsky, James Hall, Mei-Hua Res Sq Article Psychiatric electronic health records (EHRs) present a distinctive challenge in the domain of ML owing to their unstructured nature, with a high degree of complexity and variability. This study aimed to identify a cohort of patients with diagnoses of a psychotic disorder and posttraumatic stress disorder (PTSD), develop clinically-informed guidelines for annotating these health records for instances of traumatic events to create a gold standard publicly available dataset, and demonstrate that the data gathered using this annotation scheme is suitable for training a machine learning (ML) model to identify these indicators of trauma in unseen health records. We created a representative corpus of 101 EHRs (222,033 tokens) from a centralized database and a detailed annotation scheme for annotating information relevant to traumatic events in the clinical narratives. A team of clinical experts annotated the dataset and updated the annotation guidelines in collaboration with computational linguistic specialists. Inter-annotator agreement was high (0.688 for span tags, 0.589 for relations, and 0.874 for tag attributes). We characterize the major points relating to the annotation process of psychiatric EHRs. Additionally, high-performing baseline span labeling and relation extraction ML models were developed to demonstrate practical viability of the gold standard corpus for ML applications. American Journal Experts 2023-03-28 /pmc/articles/PMC10081360/ /pubmed/37034796 http://dx.doi.org/10.21203/rs.3.rs-2711718/v1 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. https://creativecommons.org/licenses/by/4.0/License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License (https://creativecommons.org/licenses/by/4.0/)
spellingShingle Article
Holderness, Eben
Atwood, Bruce
Verhagen, Marc
Shinn, Ann
Cawkwell, Philip
Pustejovsky, James
Hall, Mei-Hua
Annotation of Trauma-related Linguistic Features in Psychiatric Electronic Health Records for Machine Learning Applications
title Annotation of Trauma-related Linguistic Features in Psychiatric Electronic Health Records for Machine Learning Applications
title_full Annotation of Trauma-related Linguistic Features in Psychiatric Electronic Health Records for Machine Learning Applications
title_fullStr Annotation of Trauma-related Linguistic Features in Psychiatric Electronic Health Records for Machine Learning Applications
title_full_unstemmed Annotation of Trauma-related Linguistic Features in Psychiatric Electronic Health Records for Machine Learning Applications
title_short Annotation of Trauma-related Linguistic Features in Psychiatric Electronic Health Records for Machine Learning Applications
title_sort annotation of trauma-related linguistic features in psychiatric electronic health records for machine learning applications
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10081360/
https://www.ncbi.nlm.nih.gov/pubmed/37034796
http://dx.doi.org/10.21203/rs.3.rs-2711718/v1
work_keys_str_mv AT holdernesseben annotationoftraumarelatedlinguisticfeaturesinpsychiatricelectronichealthrecordsformachinelearningapplications
AT atwoodbruce annotationoftraumarelatedlinguisticfeaturesinpsychiatricelectronichealthrecordsformachinelearningapplications
AT verhagenmarc annotationoftraumarelatedlinguisticfeaturesinpsychiatricelectronichealthrecordsformachinelearningapplications
AT shinnann annotationoftraumarelatedlinguisticfeaturesinpsychiatricelectronichealthrecordsformachinelearningapplications
AT cawkwellphilip annotationoftraumarelatedlinguisticfeaturesinpsychiatricelectronichealthrecordsformachinelearningapplications
AT pustejovskyjames annotationoftraumarelatedlinguisticfeaturesinpsychiatricelectronichealthrecordsformachinelearningapplications
AT hallmeihua annotationoftraumarelatedlinguisticfeaturesinpsychiatricelectronichealthrecordsformachinelearningapplications