Cargando…

De-identifying Spanish medical texts - named entity recognition applied to radiology reports

BACKGROUND: Medical texts such as radiology reports or electronic health records are a powerful source of data for researchers. Anonymization methods must be developed to de-identify documents containing personal information from both patients and medical staff. Although currently there are several...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pérez-Díez, Irene, Pérez-Moraga, Raúl, López-Cerdán, Adolfo, Salinas-Serrano, Jose-Maria, la Iglesia-Vayá, María de
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8006627/ https://www.ncbi.nlm.nih.gov/pubmed/33781334 http://dx.doi.org/10.1186/s13326-021-00236-2

_version_	1783672342231121920
author	Pérez-Díez, Irene Pérez-Moraga, Raúl López-Cerdán, Adolfo Salinas-Serrano, Jose-Maria la Iglesia-Vayá, María de
author_facet	Pérez-Díez, Irene Pérez-Moraga, Raúl López-Cerdán, Adolfo Salinas-Serrano, Jose-Maria la Iglesia-Vayá, María de
author_sort	Pérez-Díez, Irene
collection	PubMed
description	BACKGROUND: Medical texts such as radiology reports or electronic health records are a powerful source of data for researchers. Anonymization methods must be developed to de-identify documents containing personal information from both patients and medical staff. Although currently there are several anonymization strategies for the English language, they are also language-dependent. Here, we introduce a named entity recognition strategy for Spanish medical texts, translatable to other languages. RESULTS: We tested 4 neural networks on our radiology reports dataset, achieving a recall of 97.18% of the identifying entities. Alongside, we developed a randomization algorithm to substitute the detected entities with new ones from the same category, making it virtually impossible to differentiate real data from synthetic data. The three best architectures were tested with the MEDDOCAN challenge dataset of electronic health records as an external test, achieving a recall of 69.18%. CONCLUSIONS: The strategy proposed, combining named entity recognition tasks with randomization of entities, is suitable for Spanish radiology reports. It does not require a big training corpus, thus it could be easily extended to other languages and medical texts, such as electronic health records.
format	Online Article Text
id	pubmed-8006627
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-80066272021-03-30 De-identifying Spanish medical texts - named entity recognition applied to radiology reports Pérez-Díez, Irene Pérez-Moraga, Raúl López-Cerdán, Adolfo Salinas-Serrano, Jose-Maria la Iglesia-Vayá, María de J Biomed Semantics Research BACKGROUND: Medical texts such as radiology reports or electronic health records are a powerful source of data for researchers. Anonymization methods must be developed to de-identify documents containing personal information from both patients and medical staff. Although currently there are several anonymization strategies for the English language, they are also language-dependent. Here, we introduce a named entity recognition strategy for Spanish medical texts, translatable to other languages. RESULTS: We tested 4 neural networks on our radiology reports dataset, achieving a recall of 97.18% of the identifying entities. Alongside, we developed a randomization algorithm to substitute the detected entities with new ones from the same category, making it virtually impossible to differentiate real data from synthetic data. The three best architectures were tested with the MEDDOCAN challenge dataset of electronic health records as an external test, achieving a recall of 69.18%. CONCLUSIONS: The strategy proposed, combining named entity recognition tasks with randomization of entities, is suitable for Spanish radiology reports. It does not require a big training corpus, thus it could be easily extended to other languages and medical texts, such as electronic health records. BioMed Central 2021-03-29 /pmc/articles/PMC8006627/ /pubmed/33781334 http://dx.doi.org/10.1186/s13326-021-00236-2 Text en © The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visithttp://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Pérez-Díez, Irene Pérez-Moraga, Raúl López-Cerdán, Adolfo Salinas-Serrano, Jose-Maria la Iglesia-Vayá, María de De-identifying Spanish medical texts - named entity recognition applied to radiology reports
title	De-identifying Spanish medical texts - named entity recognition applied to radiology reports
title_full	De-identifying Spanish medical texts - named entity recognition applied to radiology reports
title_fullStr	De-identifying Spanish medical texts - named entity recognition applied to radiology reports
title_full_unstemmed	De-identifying Spanish medical texts - named entity recognition applied to radiology reports
title_short	De-identifying Spanish medical texts - named entity recognition applied to radiology reports
title_sort	de-identifying spanish medical texts - named entity recognition applied to radiology reports
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8006627/ https://www.ncbi.nlm.nih.gov/pubmed/33781334 http://dx.doi.org/10.1186/s13326-021-00236-2
work_keys_str_mv	AT perezdiezirene deidentifyingspanishmedicaltextsnamedentityrecognitionappliedtoradiologyreports AT perezmoragaraul deidentifyingspanishmedicaltextsnamedentityrecognitionappliedtoradiologyreports AT lopezcerdanadolfo deidentifyingspanishmedicaltextsnamedentityrecognitionappliedtoradiologyreports AT salinasserranojosemaria deidentifyingspanishmedicaltextsnamedentityrecognitionappliedtoradiologyreports AT laiglesiavayamariade deidentifyingspanishmedicaltextsnamedentityrecognitionappliedtoradiologyreports

De-identifying Spanish medical texts - named entity recognition applied to radiology reports

Ejemplares similares