Cargando…

Identifying perinatal self-harm in electronic healthcare records using natural language processing

AIMS: 1.To generate a Natural Language Processing (NLP) application that can identify mentions of perinatal self-harm among electronic healthcare records (EHRs) 2.To use this application to estimate the prevalence of perinatal self-harm within a data-linkage cohort of women accessing secondary menta...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ayre, Karyn, Bittar, Andre, Dutta, Rina, Verma, Somain, Kam, Joyce
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cambridge University Press 2021
Materias:	Rapid-Fire Poster Presentations
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8771246/ http://dx.doi.org/10.1192/bjo.2021.74

_version_	1784635559506870272
author	Ayre, Karyn Bittar, Andre Dutta, Rina Verma, Somain Kam, Joyce
author_facet	Ayre, Karyn Bittar, Andre Dutta, Rina Verma, Somain Kam, Joyce
author_sort	Ayre, Karyn
collection	PubMed
description	AIMS: 1.To generate a Natural Language Processing (NLP) application that can identify mentions of perinatal self-harm among electronic healthcare records (EHRs) 2.To use this application to estimate the prevalence of perinatal self-harm within a data-linkage cohort of women accessing secondary mental healthcare during the perinatal period. METHOD: Data source: the Clinical Record Interactive Search system. This is a database of de-identified EHRs of secondary mental healthcare service-users at South London and Maudsley NHS Foundation Trust (SLaM). CRIS has pre-existing ethical approval via the Oxfordshire Research Ethics Committee C (ref 18/SC/0372) and this project was approved by the CRIS Oversight Committee (16-069). After developing a list of synonyms for self-harm and piloting coding rules, a gold standard dataset of EHRs was manually coded using Extensible Human Oracle Suite of Tools (eHOST) software. An NLP application to detect perinatal self-harm was then developed using several layers of linguistic processing based on the spaCy NLP library for Python. Evaluation of mention-level performance was done according to the attributes of mentions the application was designed to identify (span, status, temporality and polarity), by comparing application performance against the gold standard dataset. Performance was described as precision, recall, F-score and Cohen's kappa. Most service-users had more than one EHR in their period of perinatal service use. Performance was therefore also measured at “service-user level” with additional performance metrics of likelihood ratios and post-test probabilities. Linkage with the Hospital Episode Statistics datacase allowed creation of a cohort of women who accessed SLaM during the perinatal period. By deploying the application on the EHRs of the women in the cohort, we were able to estimate the prevalence of perinatal self-harm. RESULT: Mention-level performance: micro-averaged F-score, precision and recall for span, polarity and temporality all >0.8. Kappa for status 0.68, temporality 0.62, polarity 0.91. Service-user level performance: F-score, precision, recall all 0.69, overall F-score 0.81, positive likelihood ratio 9.4 (4.8–19), post-test probability 68.9% (95%CI 53–82). Cohort prevalence of self-harm in pregnancy was 15.3% (95% CI 14.3–16.3); self-harm in the postnatal year was 19.7% (95% CI 18.6–20.8). Only a very small proportion of women self-harmed in both pregnancy and the postnatal year (3.9%, 95% CI 3.3–4.4). CONCLUSION: NLP can be used to identify perinatal self-harm within EHRs. The hardest attribute to classify was temporality. This is in line with the wider literature indicating temporality as a notoriously difficult problem in NLP. As a result, the application probably over-estimates prevalence, to a degree. However, overall performance, given the difficulty of the task, is good. Bearing in mind the limitations, our findings suggest that self-harm is likely to be relatively common in women accessing secondary mental healthcare during the perinatal period. Funding: KA is funded by a National Institute for Health Research Doctoral Research Fellowship (NIHR-DRF-2016-09-042). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. RD is funded by a Clinician Scientist Fellowship (research project e-HOST-IT) from the Health Foundation in partnership with the Academy of Medical Sciences which also party funds AB. AB's work was also part supported by Health Data Research UK, an initiative funded by UK Research and Innovation, Department of Health and Social Care (England) and the devolved administrations, and leading medical research charities, as well as the Maudsley Charity. Acknowledgements: Professor Louise M Howard, who originally suggested using NLP to identify perinatal self-harm in EHRs. Professor Howard is the primary supervisor of KA's Fellowship.
format	Online Article Text
id	pubmed-8771246
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Cambridge University Press
record_format	MEDLINE/PubMed
spelling	pubmed-87712462022-01-31 Identifying perinatal self-harm in electronic healthcare records using natural language processing Ayre, Karyn Bittar, Andre Dutta, Rina Verma, Somain Kam, Joyce BJPsych Open Rapid-Fire Poster Presentations AIMS: 1.To generate a Natural Language Processing (NLP) application that can identify mentions of perinatal self-harm among electronic healthcare records (EHRs) 2.To use this application to estimate the prevalence of perinatal self-harm within a data-linkage cohort of women accessing secondary mental healthcare during the perinatal period. METHOD: Data source: the Clinical Record Interactive Search system. This is a database of de-identified EHRs of secondary mental healthcare service-users at South London and Maudsley NHS Foundation Trust (SLaM). CRIS has pre-existing ethical approval via the Oxfordshire Research Ethics Committee C (ref 18/SC/0372) and this project was approved by the CRIS Oversight Committee (16-069). After developing a list of synonyms for self-harm and piloting coding rules, a gold standard dataset of EHRs was manually coded using Extensible Human Oracle Suite of Tools (eHOST) software. An NLP application to detect perinatal self-harm was then developed using several layers of linguistic processing based on the spaCy NLP library for Python. Evaluation of mention-level performance was done according to the attributes of mentions the application was designed to identify (span, status, temporality and polarity), by comparing application performance against the gold standard dataset. Performance was described as precision, recall, F-score and Cohen's kappa. Most service-users had more than one EHR in their period of perinatal service use. Performance was therefore also measured at “service-user level” with additional performance metrics of likelihood ratios and post-test probabilities. Linkage with the Hospital Episode Statistics datacase allowed creation of a cohort of women who accessed SLaM during the perinatal period. By deploying the application on the EHRs of the women in the cohort, we were able to estimate the prevalence of perinatal self-harm. RESULT: Mention-level performance: micro-averaged F-score, precision and recall for span, polarity and temporality all >0.8. Kappa for status 0.68, temporality 0.62, polarity 0.91. Service-user level performance: F-score, precision, recall all 0.69, overall F-score 0.81, positive likelihood ratio 9.4 (4.8–19), post-test probability 68.9% (95%CI 53–82). Cohort prevalence of self-harm in pregnancy was 15.3% (95% CI 14.3–16.3); self-harm in the postnatal year was 19.7% (95% CI 18.6–20.8). Only a very small proportion of women self-harmed in both pregnancy and the postnatal year (3.9%, 95% CI 3.3–4.4). CONCLUSION: NLP can be used to identify perinatal self-harm within EHRs. The hardest attribute to classify was temporality. This is in line with the wider literature indicating temporality as a notoriously difficult problem in NLP. As a result, the application probably over-estimates prevalence, to a degree. However, overall performance, given the difficulty of the task, is good. Bearing in mind the limitations, our findings suggest that self-harm is likely to be relatively common in women accessing secondary mental healthcare during the perinatal period. Funding: KA is funded by a National Institute for Health Research Doctoral Research Fellowship (NIHR-DRF-2016-09-042). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. RD is funded by a Clinician Scientist Fellowship (research project e-HOST-IT) from the Health Foundation in partnership with the Academy of Medical Sciences which also party funds AB. AB's work was also part supported by Health Data Research UK, an initiative funded by UK Research and Innovation, Department of Health and Social Care (England) and the devolved administrations, and leading medical research charities, as well as the Maudsley Charity. Acknowledgements: Professor Louise M Howard, who originally suggested using NLP to identify perinatal self-harm in EHRs. Professor Howard is the primary supervisor of KA's Fellowship. Cambridge University Press 2021-06-18 /pmc/articles/PMC8771246/ http://dx.doi.org/10.1192/bjo.2021.74 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Rapid-Fire Poster Presentations Ayre, Karyn Bittar, Andre Dutta, Rina Verma, Somain Kam, Joyce Identifying perinatal self-harm in electronic healthcare records using natural language processing
title	Identifying perinatal self-harm in electronic healthcare records using natural language processing
title_full	Identifying perinatal self-harm in electronic healthcare records using natural language processing
title_fullStr	Identifying perinatal self-harm in electronic healthcare records using natural language processing
title_full_unstemmed	Identifying perinatal self-harm in electronic healthcare records using natural language processing
title_short	Identifying perinatal self-harm in electronic healthcare records using natural language processing
title_sort	identifying perinatal self-harm in electronic healthcare records using natural language processing
topic	Rapid-Fire Poster Presentations
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8771246/ http://dx.doi.org/10.1192/bjo.2021.74
work_keys_str_mv	AT ayrekaryn identifyingperinatalselfharminelectronichealthcarerecordsusingnaturallanguageprocessing AT bittarandre identifyingperinatalselfharminelectronichealthcarerecordsusingnaturallanguageprocessing AT duttarina identifyingperinatalselfharminelectronichealthcarerecordsusingnaturallanguageprocessing AT vermasomain identifyingperinatalselfharminelectronichealthcarerecordsusingnaturallanguageprocessing AT kamjoyce identifyingperinatalselfharminelectronichealthcarerecordsusingnaturallanguageprocessing

Identifying perinatal self-harm in electronic healthcare records using natural language processing

Ejemplares similares