Cargando…

Identifying perinatal self-harm in electronic healthcare records using natural language processing

AIMS: 1.To generate a Natural Language Processing (NLP) application that can identify mentions of perinatal self-harm among electronic healthcare records (EHRs) 2.To use this application to estimate the prevalence of perinatal self-harm within a data-linkage cohort of women accessing secondary menta...

Descripción completa

Detalles Bibliográficos
Autores principales: Ayre, Karyn, Bittar, Andre, Dutta, Rina, Verma, Somain, Kam, Joyce
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cambridge University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8771246/
http://dx.doi.org/10.1192/bjo.2021.74
_version_ 1784635559506870272
author Ayre, Karyn
Bittar, Andre
Dutta, Rina
Verma, Somain
Kam, Joyce
author_facet Ayre, Karyn
Bittar, Andre
Dutta, Rina
Verma, Somain
Kam, Joyce
author_sort Ayre, Karyn
collection PubMed
description AIMS: 1.To generate a Natural Language Processing (NLP) application that can identify mentions of perinatal self-harm among electronic healthcare records (EHRs) 2.To use this application to estimate the prevalence of perinatal self-harm within a data-linkage cohort of women accessing secondary mental healthcare during the perinatal period. METHOD: Data source: the Clinical Record Interactive Search system. This is a database of de-identified EHRs of secondary mental healthcare service-users at South London and Maudsley NHS Foundation Trust (SLaM). CRIS has pre-existing ethical approval via the Oxfordshire Research Ethics Committee C (ref 18/SC/0372) and this project was approved by the CRIS Oversight Committee (16-069). After developing a list of synonyms for self-harm and piloting coding rules, a gold standard dataset of EHRs was manually coded using Extensible Human Oracle Suite of Tools (eHOST) software. An NLP application to detect perinatal self-harm was then developed using several layers of linguistic processing based on the spaCy NLP library for Python. Evaluation of mention-level performance was done according to the attributes of mentions the application was designed to identify (span, status, temporality and polarity), by comparing application performance against the gold standard dataset. Performance was described as precision, recall, F-score and Cohen's kappa. Most service-users had more than one EHR in their period of perinatal service use. Performance was therefore also measured at “service-user level” with additional performance metrics of likelihood ratios and post-test probabilities. Linkage with the Hospital Episode Statistics datacase allowed creation of a cohort of women who accessed SLaM during the perinatal period. By deploying the application on the EHRs of the women in the cohort, we were able to estimate the prevalence of perinatal self-harm. RESULT: Mention-level performance: micro-averaged F-score, precision and recall for span, polarity and temporality all >0.8. Kappa for status 0.68, temporality 0.62, polarity 0.91. Service-user level performance: F-score, precision, recall all 0.69, overall F-score 0.81, positive likelihood ratio 9.4 (4.8–19), post-test probability 68.9% (95%CI 53–82). Cohort prevalence of self-harm in pregnancy was 15.3% (95% CI 14.3–16.3); self-harm in the postnatal year was 19.7% (95% CI 18.6–20.8). Only a very small proportion of women self-harmed in both pregnancy and the postnatal year (3.9%, 95% CI 3.3–4.4). CONCLUSION: NLP can be used to identify perinatal self-harm within EHRs. The hardest attribute to classify was temporality. This is in line with the wider literature indicating temporality as a notoriously difficult problem in NLP. As a result, the application probably over-estimates prevalence, to a degree. However, overall performance, given the difficulty of the task, is good. Bearing in mind the limitations, our findings suggest that self-harm is likely to be relatively common in women accessing secondary mental healthcare during the perinatal period. Funding: KA is funded by a National Institute for Health Research Doctoral Research Fellowship (NIHR-DRF-2016-09-042). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. RD is funded by a Clinician Scientist Fellowship (research project e-HOST-IT) from the Health Foundation in partnership with the Academy of Medical Sciences which also party funds AB. AB's work was also part supported by Health Data Research UK, an initiative funded by UK Research and Innovation, Department of Health and Social Care (England) and the devolved administrations, and leading medical research charities, as well as the Maudsley Charity. Acknowledgements: Professor Louise M Howard, who originally suggested using NLP to identify perinatal self-harm in EHRs. Professor Howard is the primary supervisor of KA's Fellowship.
format Online
Article
Text
id pubmed-8771246
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Cambridge University Press
record_format MEDLINE/PubMed
spelling pubmed-87712462022-01-31 Identifying perinatal self-harm in electronic healthcare records using natural language processing Ayre, Karyn Bittar, Andre Dutta, Rina Verma, Somain Kam, Joyce BJPsych Open Rapid-Fire Poster Presentations AIMS: 1.To generate a Natural Language Processing (NLP) application that can identify mentions of perinatal self-harm among electronic healthcare records (EHRs) 2.To use this application to estimate the prevalence of perinatal self-harm within a data-linkage cohort of women accessing secondary mental healthcare during the perinatal period. METHOD: Data source: the Clinical Record Interactive Search system. This is a database of de-identified EHRs of secondary mental healthcare service-users at South London and Maudsley NHS Foundation Trust (SLaM). CRIS has pre-existing ethical approval via the Oxfordshire Research Ethics Committee C (ref 18/SC/0372) and this project was approved by the CRIS Oversight Committee (16-069). After developing a list of synonyms for self-harm and piloting coding rules, a gold standard dataset of EHRs was manually coded using Extensible Human Oracle Suite of Tools (eHOST) software. An NLP application to detect perinatal self-harm was then developed using several layers of linguistic processing based on the spaCy NLP library for Python. Evaluation of mention-level performance was done according to the attributes of mentions the application was designed to identify (span, status, temporality and polarity), by comparing application performance against the gold standard dataset. Performance was described as precision, recall, F-score and Cohen's kappa. Most service-users had more than one EHR in their period of perinatal service use. Performance was therefore also measured at “service-user level” with additional performance metrics of likelihood ratios and post-test probabilities. Linkage with the Hospital Episode Statistics datacase allowed creation of a cohort of women who accessed SLaM during the perinatal period. By deploying the application on the EHRs of the women in the cohort, we were able to estimate the prevalence of perinatal self-harm. RESULT: Mention-level performance: micro-averaged F-score, precision and recall for span, polarity and temporality all >0.8. Kappa for status 0.68, temporality 0.62, polarity 0.91. Service-user level performance: F-score, precision, recall all 0.69, overall F-score 0.81, positive likelihood ratio 9.4 (4.8–19), post-test probability 68.9% (95%CI 53–82). Cohort prevalence of self-harm in pregnancy was 15.3% (95% CI 14.3–16.3); self-harm in the postnatal year was 19.7% (95% CI 18.6–20.8). Only a very small proportion of women self-harmed in both pregnancy and the postnatal year (3.9%, 95% CI 3.3–4.4). CONCLUSION: NLP can be used to identify perinatal self-harm within EHRs. The hardest attribute to classify was temporality. This is in line with the wider literature indicating temporality as a notoriously difficult problem in NLP. As a result, the application probably over-estimates prevalence, to a degree. However, overall performance, given the difficulty of the task, is good. Bearing in mind the limitations, our findings suggest that self-harm is likely to be relatively common in women accessing secondary mental healthcare during the perinatal period. Funding: KA is funded by a National Institute for Health Research Doctoral Research Fellowship (NIHR-DRF-2016-09-042). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. RD is funded by a Clinician Scientist Fellowship (research project e-HOST-IT) from the Health Foundation in partnership with the Academy of Medical Sciences which also party funds AB. AB's work was also part supported by Health Data Research UK, an initiative funded by UK Research and Innovation, Department of Health and Social Care (England) and the devolved administrations, and leading medical research charities, as well as the Maudsley Charity. Acknowledgements: Professor Louise M Howard, who originally suggested using NLP to identify perinatal self-harm in EHRs. Professor Howard is the primary supervisor of KA's Fellowship. Cambridge University Press 2021-06-18 /pmc/articles/PMC8771246/ http://dx.doi.org/10.1192/bjo.2021.74 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Rapid-Fire Poster Presentations
Ayre, Karyn
Bittar, Andre
Dutta, Rina
Verma, Somain
Kam, Joyce
Identifying perinatal self-harm in electronic healthcare records using natural language processing
title Identifying perinatal self-harm in electronic healthcare records using natural language processing
title_full Identifying perinatal self-harm in electronic healthcare records using natural language processing
title_fullStr Identifying perinatal self-harm in electronic healthcare records using natural language processing
title_full_unstemmed Identifying perinatal self-harm in electronic healthcare records using natural language processing
title_short Identifying perinatal self-harm in electronic healthcare records using natural language processing
title_sort identifying perinatal self-harm in electronic healthcare records using natural language processing
topic Rapid-Fire Poster Presentations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8771246/
http://dx.doi.org/10.1192/bjo.2021.74
work_keys_str_mv AT ayrekaryn identifyingperinatalselfharminelectronichealthcarerecordsusingnaturallanguageprocessing
AT bittarandre identifyingperinatalselfharminelectronichealthcarerecordsusingnaturallanguageprocessing
AT duttarina identifyingperinatalselfharminelectronichealthcarerecordsusingnaturallanguageprocessing
AT vermasomain identifyingperinatalselfharminelectronichealthcarerecordsusingnaturallanguageprocessing
AT kamjoyce identifyingperinatalselfharminelectronichealthcarerecordsusingnaturallanguageprocessing