Cargando…
Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study
BACKGROUND: Reliable and interpretable automatic extraction of clinical phenotypes from large electronic medical record databases remains a challenge, especially in a language other than English. OBJECTIVE: We aimed to provide an automated end-to-end extraction of cohorts of similar patients from el...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9808583/ https://www.ncbi.nlm.nih.gov/pubmed/36534446 http://dx.doi.org/10.2196/42379 |
_version_ | 1784862965800894464 |
---|---|
author | Gérardin, Christel Mageau, Arthur Mékinian, Arsène Tannier, Xavier Carrat, Fabrice |
author_facet | Gérardin, Christel Mageau, Arthur Mékinian, Arsène Tannier, Xavier Carrat, Fabrice |
author_sort | Gérardin, Christel |
collection | PubMed |
description | BACKGROUND: Reliable and interpretable automatic extraction of clinical phenotypes from large electronic medical record databases remains a challenge, especially in a language other than English. OBJECTIVE: We aimed to provide an automated end-to-end extraction of cohorts of similar patients from electronic health records for systemic diseases. METHODS: Our multistep algorithm includes a named-entity recognition step, a multilabel classification using medical subject headings ontology, and the computation of patient similarity. A selection of cohorts of similar patients on a priori annotated phenotypes was performed. Six phenotypes were selected for their clinical significance: P1, osteoporosis; P2, nephritis in systemic erythematosus lupus; P3, interstitial lung disease in systemic sclerosis; P4, lung infection; P5, obstetric antiphospholipid syndrome; and P6, Takayasu arteritis. We used a training set of 151 clinical notes and an independent validation set of 256 clinical notes, with annotated phenotypes, both extracted from the Assistance Publique-Hôpitaux de Paris data warehouse. We evaluated the precision of the 3 patients closest to the index patient for each phenotype with precision-at-3 and recall and average precision. RESULTS: For P1-P4, the precision-at-3 ranged from 0.85 (95% CI 0.75-0.95) to 0.99 (95% CI 0.98-1), the recall ranged from 0.53 (95% CI 0.50-0.55) to 0.83 (95% CI 0.81-0.84), and the average precision ranged from 0.58 (95% CI 0.54-0.62) to 0.88 (95% CI 0.85-0.90). P5-P6 phenotypes could not be analyzed due to the limited number of phenotypes. CONCLUSIONS: Using a method close to clinical reasoning, we built a scalable and interpretable end-to-end algorithm for extracting cohorts of similar patients. |
format | Online Article Text |
id | pubmed-9808583 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-98085832023-01-04 Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study Gérardin, Christel Mageau, Arthur Mékinian, Arsène Tannier, Xavier Carrat, Fabrice JMIR Med Inform Original Paper BACKGROUND: Reliable and interpretable automatic extraction of clinical phenotypes from large electronic medical record databases remains a challenge, especially in a language other than English. OBJECTIVE: We aimed to provide an automated end-to-end extraction of cohorts of similar patients from electronic health records for systemic diseases. METHODS: Our multistep algorithm includes a named-entity recognition step, a multilabel classification using medical subject headings ontology, and the computation of patient similarity. A selection of cohorts of similar patients on a priori annotated phenotypes was performed. Six phenotypes were selected for their clinical significance: P1, osteoporosis; P2, nephritis in systemic erythematosus lupus; P3, interstitial lung disease in systemic sclerosis; P4, lung infection; P5, obstetric antiphospholipid syndrome; and P6, Takayasu arteritis. We used a training set of 151 clinical notes and an independent validation set of 256 clinical notes, with annotated phenotypes, both extracted from the Assistance Publique-Hôpitaux de Paris data warehouse. We evaluated the precision of the 3 patients closest to the index patient for each phenotype with precision-at-3 and recall and average precision. RESULTS: For P1-P4, the precision-at-3 ranged from 0.85 (95% CI 0.75-0.95) to 0.99 (95% CI 0.98-1), the recall ranged from 0.53 (95% CI 0.50-0.55) to 0.83 (95% CI 0.81-0.84), and the average precision ranged from 0.58 (95% CI 0.54-0.62) to 0.88 (95% CI 0.85-0.90). P5-P6 phenotypes could not be analyzed due to the limited number of phenotypes. CONCLUSIONS: Using a method close to clinical reasoning, we built a scalable and interpretable end-to-end algorithm for extracting cohorts of similar patients. JMIR Publications 2022-12-19 /pmc/articles/PMC9808583/ /pubmed/36534446 http://dx.doi.org/10.2196/42379 Text en ©Christel Gérardin, Arthur Mageau, Arsène Mékinian, Xavier Tannier, Fabrice Carrat. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 19.12.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Gérardin, Christel Mageau, Arthur Mékinian, Arsène Tannier, Xavier Carrat, Fabrice Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study |
title | Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study |
title_full | Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study |
title_fullStr | Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study |
title_full_unstemmed | Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study |
title_short | Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study |
title_sort | construction of cohorts of similar patients from automatic extraction of medical concepts: phenotype extraction study |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9808583/ https://www.ncbi.nlm.nih.gov/pubmed/36534446 http://dx.doi.org/10.2196/42379 |
work_keys_str_mv | AT gerardinchristel constructionofcohortsofsimilarpatientsfromautomaticextractionofmedicalconceptsphenotypeextractionstudy AT mageauarthur constructionofcohortsofsimilarpatientsfromautomaticextractionofmedicalconceptsphenotypeextractionstudy AT mekinianarsene constructionofcohortsofsimilarpatientsfromautomaticextractionofmedicalconceptsphenotypeextractionstudy AT tannierxavier constructionofcohortsofsimilarpatientsfromautomaticextractionofmedicalconceptsphenotypeextractionstudy AT carratfabrice constructionofcohortsofsimilarpatientsfromautomaticextractionofmedicalconceptsphenotypeextractionstudy |