Cargando…

Crosslingual named entity recognition for clinical de-identification applied to a COVID-19 Italian data set

The COrona VIrus Disease 19 (COVID-19) pandemic required the work of all global experts to tackle it. Despite the abundance of new studies, privacy laws prevent their dissemination for medical investigations: through clinical de-identification, the Protected Health Information (PHI) contained therei...

Descripción completa

Detalles Bibliográficos
Autores principales: Catelli, Rosario, Gargiulo, Francesco, Casola, Valentina, De Pietro, Giuseppe, Fujita, Hamido, Esposito, Massimo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier B.V. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7544600/
https://www.ncbi.nlm.nih.gov/pubmed/33052197
http://dx.doi.org/10.1016/j.asoc.2020.106779
_version_ 1783591882956210176
author Catelli, Rosario
Gargiulo, Francesco
Casola, Valentina
De Pietro, Giuseppe
Fujita, Hamido
Esposito, Massimo
author_facet Catelli, Rosario
Gargiulo, Francesco
Casola, Valentina
De Pietro, Giuseppe
Fujita, Hamido
Esposito, Massimo
author_sort Catelli, Rosario
collection PubMed
description The COrona VIrus Disease 19 (COVID-19) pandemic required the work of all global experts to tackle it. Despite the abundance of new studies, privacy laws prevent their dissemination for medical investigations: through clinical de-identification, the Protected Health Information (PHI) contained therein can be anonymized so that medical records can be shared and published. The automation of clinical de-identification through deep learning techniques has proven to be less effective for languages other than English due to the scarcity of data sets. Hence a new Italian de-identification data set has been created from the COVID-19 clinical records made available by the Italian Society of Radiology (SIRM). Therefore, two multi-lingual deep learning systems have been developed for this low-resource language scenario: the objective is to investigate their ability to transfer knowledge between different languages while maintaining the necessary features to correctly perform the Named Entity Recognition task for de-identification. The systems were trained using four different strategies, using both the English Informatics for Integrating Biology & the Bedside (i2b2) 2014 and the new Italian SIRM COVID-19 data sets, then evaluated on the latter. These approaches have demonstrated the effectiveness of cross-lingual transfer learning to de-identify medical records written in a low resource language such as Italian, using one with high resources such as English.
format Online
Article
Text
id pubmed-7544600
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Elsevier B.V.
record_format MEDLINE/PubMed
spelling pubmed-75446002020-10-09 Crosslingual named entity recognition for clinical de-identification applied to a COVID-19 Italian data set Catelli, Rosario Gargiulo, Francesco Casola, Valentina De Pietro, Giuseppe Fujita, Hamido Esposito, Massimo Appl Soft Comput Article The COrona VIrus Disease 19 (COVID-19) pandemic required the work of all global experts to tackle it. Despite the abundance of new studies, privacy laws prevent their dissemination for medical investigations: through clinical de-identification, the Protected Health Information (PHI) contained therein can be anonymized so that medical records can be shared and published. The automation of clinical de-identification through deep learning techniques has proven to be less effective for languages other than English due to the scarcity of data sets. Hence a new Italian de-identification data set has been created from the COVID-19 clinical records made available by the Italian Society of Radiology (SIRM). Therefore, two multi-lingual deep learning systems have been developed for this low-resource language scenario: the objective is to investigate their ability to transfer knowledge between different languages while maintaining the necessary features to correctly perform the Named Entity Recognition task for de-identification. The systems were trained using four different strategies, using both the English Informatics for Integrating Biology & the Bedside (i2b2) 2014 and the new Italian SIRM COVID-19 data sets, then evaluated on the latter. These approaches have demonstrated the effectiveness of cross-lingual transfer learning to de-identify medical records written in a low resource language such as Italian, using one with high resources such as English. Elsevier B.V. 2020-12 2020-10-09 /pmc/articles/PMC7544600/ /pubmed/33052197 http://dx.doi.org/10.1016/j.asoc.2020.106779 Text en © 2020 Elsevier B.V. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Catelli, Rosario
Gargiulo, Francesco
Casola, Valentina
De Pietro, Giuseppe
Fujita, Hamido
Esposito, Massimo
Crosslingual named entity recognition for clinical de-identification applied to a COVID-19 Italian data set
title Crosslingual named entity recognition for clinical de-identification applied to a COVID-19 Italian data set
title_full Crosslingual named entity recognition for clinical de-identification applied to a COVID-19 Italian data set
title_fullStr Crosslingual named entity recognition for clinical de-identification applied to a COVID-19 Italian data set
title_full_unstemmed Crosslingual named entity recognition for clinical de-identification applied to a COVID-19 Italian data set
title_short Crosslingual named entity recognition for clinical de-identification applied to a COVID-19 Italian data set
title_sort crosslingual named entity recognition for clinical de-identification applied to a covid-19 italian data set
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7544600/
https://www.ncbi.nlm.nih.gov/pubmed/33052197
http://dx.doi.org/10.1016/j.asoc.2020.106779
work_keys_str_mv AT catellirosario crosslingualnamedentityrecognitionforclinicaldeidentificationappliedtoacovid19italiandataset
AT gargiulofrancesco crosslingualnamedentityrecognitionforclinicaldeidentificationappliedtoacovid19italiandataset
AT casolavalentina crosslingualnamedentityrecognitionforclinicaldeidentificationappliedtoacovid19italiandataset
AT depietrogiuseppe crosslingualnamedentityrecognitionforclinicaldeidentificationappliedtoacovid19italiandataset
AT fujitahamido crosslingualnamedentityrecognitionforclinicaldeidentificationappliedtoacovid19italiandataset
AT espositomassimo crosslingualnamedentityrecognitionforclinicaldeidentificationappliedtoacovid19italiandataset