Cargando…

Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study

BACKGROUND: Reliable and interpretable automatic extraction of clinical phenotypes from large electronic medical record databases remains a challenge, especially in a language other than English. OBJECTIVE: We aimed to provide an automated end-to-end extraction of cohorts of similar patients from el...

Descripción completa

Detalles Bibliográficos
Autores principales: Gérardin, Christel, Mageau, Arthur, Mékinian, Arsène, Tannier, Xavier, Carrat, Fabrice
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9808583/
https://www.ncbi.nlm.nih.gov/pubmed/36534446
http://dx.doi.org/10.2196/42379
_version_ 1784862965800894464
author Gérardin, Christel
Mageau, Arthur
Mékinian, Arsène
Tannier, Xavier
Carrat, Fabrice
author_facet Gérardin, Christel
Mageau, Arthur
Mékinian, Arsène
Tannier, Xavier
Carrat, Fabrice
author_sort Gérardin, Christel
collection PubMed
description BACKGROUND: Reliable and interpretable automatic extraction of clinical phenotypes from large electronic medical record databases remains a challenge, especially in a language other than English. OBJECTIVE: We aimed to provide an automated end-to-end extraction of cohorts of similar patients from electronic health records for systemic diseases. METHODS: Our multistep algorithm includes a named-entity recognition step, a multilabel classification using medical subject headings ontology, and the computation of patient similarity. A selection of cohorts of similar patients on a priori annotated phenotypes was performed. Six phenotypes were selected for their clinical significance: P1, osteoporosis; P2, nephritis in systemic erythematosus lupus; P3, interstitial lung disease in systemic sclerosis; P4, lung infection; P5, obstetric antiphospholipid syndrome; and P6, Takayasu arteritis. We used a training set of 151 clinical notes and an independent validation set of 256 clinical notes, with annotated phenotypes, both extracted from the Assistance Publique-Hôpitaux de Paris data warehouse. We evaluated the precision of the 3 patients closest to the index patient for each phenotype with precision-at-3 and recall and average precision. RESULTS: For P1-P4, the precision-at-3 ranged from 0.85 (95% CI 0.75-0.95) to 0.99 (95% CI 0.98-1), the recall ranged from 0.53 (95% CI 0.50-0.55) to 0.83 (95% CI 0.81-0.84), and the average precision ranged from 0.58 (95% CI 0.54-0.62) to 0.88 (95% CI 0.85-0.90). P5-P6 phenotypes could not be analyzed due to the limited number of phenotypes. CONCLUSIONS: Using a method close to clinical reasoning, we built a scalable and interpretable end-to-end algorithm for extracting cohorts of similar patients.
format Online
Article
Text
id pubmed-9808583
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-98085832023-01-04 Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study Gérardin, Christel Mageau, Arthur Mékinian, Arsène Tannier, Xavier Carrat, Fabrice JMIR Med Inform Original Paper BACKGROUND: Reliable and interpretable automatic extraction of clinical phenotypes from large electronic medical record databases remains a challenge, especially in a language other than English. OBJECTIVE: We aimed to provide an automated end-to-end extraction of cohorts of similar patients from electronic health records for systemic diseases. METHODS: Our multistep algorithm includes a named-entity recognition step, a multilabel classification using medical subject headings ontology, and the computation of patient similarity. A selection of cohorts of similar patients on a priori annotated phenotypes was performed. Six phenotypes were selected for their clinical significance: P1, osteoporosis; P2, nephritis in systemic erythematosus lupus; P3, interstitial lung disease in systemic sclerosis; P4, lung infection; P5, obstetric antiphospholipid syndrome; and P6, Takayasu arteritis. We used a training set of 151 clinical notes and an independent validation set of 256 clinical notes, with annotated phenotypes, both extracted from the Assistance Publique-Hôpitaux de Paris data warehouse. We evaluated the precision of the 3 patients closest to the index patient for each phenotype with precision-at-3 and recall and average precision. RESULTS: For P1-P4, the precision-at-3 ranged from 0.85 (95% CI 0.75-0.95) to 0.99 (95% CI 0.98-1), the recall ranged from 0.53 (95% CI 0.50-0.55) to 0.83 (95% CI 0.81-0.84), and the average precision ranged from 0.58 (95% CI 0.54-0.62) to 0.88 (95% CI 0.85-0.90). P5-P6 phenotypes could not be analyzed due to the limited number of phenotypes. CONCLUSIONS: Using a method close to clinical reasoning, we built a scalable and interpretable end-to-end algorithm for extracting cohorts of similar patients. JMIR Publications 2022-12-19 /pmc/articles/PMC9808583/ /pubmed/36534446 http://dx.doi.org/10.2196/42379 Text en ©Christel Gérardin, Arthur Mageau, Arsène Mékinian, Xavier Tannier, Fabrice Carrat. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 19.12.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Gérardin, Christel
Mageau, Arthur
Mékinian, Arsène
Tannier, Xavier
Carrat, Fabrice
Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study
title Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study
title_full Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study
title_fullStr Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study
title_full_unstemmed Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study
title_short Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study
title_sort construction of cohorts of similar patients from automatic extraction of medical concepts: phenotype extraction study
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9808583/
https://www.ncbi.nlm.nih.gov/pubmed/36534446
http://dx.doi.org/10.2196/42379
work_keys_str_mv AT gerardinchristel constructionofcohortsofsimilarpatientsfromautomaticextractionofmedicalconceptsphenotypeextractionstudy
AT mageauarthur constructionofcohortsofsimilarpatientsfromautomaticextractionofmedicalconceptsphenotypeextractionstudy
AT mekinianarsene constructionofcohortsofsimilarpatientsfromautomaticextractionofmedicalconceptsphenotypeextractionstudy
AT tannierxavier constructionofcohortsofsimilarpatientsfromautomaticextractionofmedicalconceptsphenotypeextractionstudy
AT carratfabrice constructionofcohortsofsimilarpatientsfromautomaticextractionofmedicalconceptsphenotypeextractionstudy