Cargando…

Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome

BACKGROUND: The growing use of Electronic Health Records (EHRs) is promoting the application of data mining in health-care. A promising use of big data in this field is to develop models to support early diagnosis and to establish natural history. Dravet Syndrome (DS) is a rare developmental and epi...

Descripción completa

Detalles Bibliográficos
Autores principales: Barco, Tommaso Lo, Kuchenbuch, Mathieu, Garcelon, Nicolas, Neuraz, Antoine, Nabbout, Rima
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8278630/
https://www.ncbi.nlm.nih.gov/pubmed/34256808
http://dx.doi.org/10.1186/s13023-021-01936-9
_version_ 1783722300327067648
author Barco, Tommaso Lo
Kuchenbuch, Mathieu
Garcelon, Nicolas
Neuraz, Antoine
Nabbout, Rima
author_facet Barco, Tommaso Lo
Kuchenbuch, Mathieu
Garcelon, Nicolas
Neuraz, Antoine
Nabbout, Rima
author_sort Barco, Tommaso Lo
collection PubMed
description BACKGROUND: The growing use of Electronic Health Records (EHRs) is promoting the application of data mining in health-care. A promising use of big data in this field is to develop models to support early diagnosis and to establish natural history. Dravet Syndrome (DS) is a rare developmental and epileptic encephalopathy that commonly initiates in the first year of life with febrile seizures (FS). Age at diagnosis is often delayed after 2 years, as it is difficult to differentiate DS at onset from FS. We aimed to explore if some clinical terms (concepts) are significantly more used in the electronic narrative medical reports of individuals with DS before the age of 2 years compared to those of individuals with FS. These concepts would allow an earlier detection of patients with DS resulting in an earlier orientation toward expert centers that can provide early diagnosis and care. METHODS: Data were collected from the Necker Enfants Malades Hospital using a document-based data warehouse, Dr Warehouse, which employs Natural Language Processing, a computer technology consisting in processing written information. Using Unified Medical Language System Meta-thesaurus, phenotype concepts can be recognized in medical reports. We selected individuals with DS (DS Cohort) and individuals with FS (FS Cohort) with confirmed diagnosis after the age of 4 years. A phenome-wide analysis was performed evaluating the statistical associations between the phenotypes of DS and FS, based on concepts found in the reports produced before 2 years and using a series of logistic regressions. RESULTS: We found significative higher representation of concepts related to seizures’ phenotypes distinguishing DS from FS in the first phases, namely the major recurrence of complex febrile convulsions (long-lasting and/or with focal signs) and other seizure-types. Some typical early onset non-seizure concepts also emerged, in relation to neurodevelopment and gait disorders. CONCLUSIONS: Narrative medical reports of individuals younger than 2 years with FS contain specific concepts linked to DS diagnosis, which can be automatically detected by software exploiting NLP. This approach could represent an innovative and sustainable methodology to decrease time of diagnosis of DS and could be transposed to other rare diseases.
format Online
Article
Text
id pubmed-8278630
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-82786302021-07-14 Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome Barco, Tommaso Lo Kuchenbuch, Mathieu Garcelon, Nicolas Neuraz, Antoine Nabbout, Rima Orphanet J Rare Dis Research BACKGROUND: The growing use of Electronic Health Records (EHRs) is promoting the application of data mining in health-care. A promising use of big data in this field is to develop models to support early diagnosis and to establish natural history. Dravet Syndrome (DS) is a rare developmental and epileptic encephalopathy that commonly initiates in the first year of life with febrile seizures (FS). Age at diagnosis is often delayed after 2 years, as it is difficult to differentiate DS at onset from FS. We aimed to explore if some clinical terms (concepts) are significantly more used in the electronic narrative medical reports of individuals with DS before the age of 2 years compared to those of individuals with FS. These concepts would allow an earlier detection of patients with DS resulting in an earlier orientation toward expert centers that can provide early diagnosis and care. METHODS: Data were collected from the Necker Enfants Malades Hospital using a document-based data warehouse, Dr Warehouse, which employs Natural Language Processing, a computer technology consisting in processing written information. Using Unified Medical Language System Meta-thesaurus, phenotype concepts can be recognized in medical reports. We selected individuals with DS (DS Cohort) and individuals with FS (FS Cohort) with confirmed diagnosis after the age of 4 years. A phenome-wide analysis was performed evaluating the statistical associations between the phenotypes of DS and FS, based on concepts found in the reports produced before 2 years and using a series of logistic regressions. RESULTS: We found significative higher representation of concepts related to seizures’ phenotypes distinguishing DS from FS in the first phases, namely the major recurrence of complex febrile convulsions (long-lasting and/or with focal signs) and other seizure-types. Some typical early onset non-seizure concepts also emerged, in relation to neurodevelopment and gait disorders. CONCLUSIONS: Narrative medical reports of individuals younger than 2 years with FS contain specific concepts linked to DS diagnosis, which can be automatically detected by software exploiting NLP. This approach could represent an innovative and sustainable methodology to decrease time of diagnosis of DS and could be transposed to other rare diseases. BioMed Central 2021-07-13 /pmc/articles/PMC8278630/ /pubmed/34256808 http://dx.doi.org/10.1186/s13023-021-01936-9 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Barco, Tommaso Lo
Kuchenbuch, Mathieu
Garcelon, Nicolas
Neuraz, Antoine
Nabbout, Rima
Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome
title Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome
title_full Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome
title_fullStr Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome
title_full_unstemmed Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome
title_short Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome
title_sort improving early diagnosis of rare diseases using natural language processing in unstructured medical records: an illustration from dravet syndrome
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8278630/
https://www.ncbi.nlm.nih.gov/pubmed/34256808
http://dx.doi.org/10.1186/s13023-021-01936-9
work_keys_str_mv AT barcotommasolo improvingearlydiagnosisofrarediseasesusingnaturallanguageprocessinginunstructuredmedicalrecordsanillustrationfromdravetsyndrome
AT kuchenbuchmathieu improvingearlydiagnosisofrarediseasesusingnaturallanguageprocessinginunstructuredmedicalrecordsanillustrationfromdravetsyndrome
AT garcelonnicolas improvingearlydiagnosisofrarediseasesusingnaturallanguageprocessinginunstructuredmedicalrecordsanillustrationfromdravetsyndrome
AT neurazantoine improvingearlydiagnosisofrarediseasesusingnaturallanguageprocessinginunstructuredmedicalrecordsanillustrationfromdravetsyndrome
AT nabboutrima improvingearlydiagnosisofrarediseasesusingnaturallanguageprocessinginunstructuredmedicalrecordsanillustrationfromdravetsyndrome