Cargando…
Effect of vocabulary mapping for conditions on phenotype cohorts
OBJECTIVE: To study the effect on patient cohorts of mapping condition (diagnosis) codes from source billing vocabularies to a clinical vocabulary. MATERIALS AND METHODS: Nine International Classification of Diseases, Ninth Revision, Clinical Modification (ICD9-CM) concept sets were extracted from e...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6289550/ https://www.ncbi.nlm.nih.gov/pubmed/30395248 http://dx.doi.org/10.1093/jamia/ocy124 |
_version_ | 1783379976772386816 |
---|---|
author | Hripcsak, George Levine, Matthew E Shang, Ning Ryan, Patrick B |
author_facet | Hripcsak, George Levine, Matthew E Shang, Ning Ryan, Patrick B |
author_sort | Hripcsak, George |
collection | PubMed |
description | OBJECTIVE: To study the effect on patient cohorts of mapping condition (diagnosis) codes from source billing vocabularies to a clinical vocabulary. MATERIALS AND METHODS: Nine International Classification of Diseases, Ninth Revision, Clinical Modification (ICD9-CM) concept sets were extracted from eMERGE network phenotypes, translated to Systematized Nomenclature of Medicine - Clinical Terms concept sets, and applied to patient data that were mapped from source ICD9-CM and ICD10-CM codes to Systematized Nomenclature of Medicine - Clinical Terms codes using Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership (OMOP) vocabulary mappings. The original ICD9-CM concept set and a concept set extended to ICD10-CM were used to create patient cohorts that served as gold standards. RESULTS: Four phenotype concept sets were able to be translated to Systematized Nomenclature of Medicine - Clinical Terms without ambiguities and were able to perform perfectly with respect to the gold standards. The other 5 lost performance when 2 or more ICD9-CM or ICD10-CM codes mapped to the same Systematized Nomenclature of Medicine - Clinical Terms code. The patient cohorts had a total error (false positive and false negative) of up to 0.15% compared to querying ICD9-CM source data and up to 0.26% compared to querying ICD9-CM and ICD10-CM data. Knowledge engineering was required to produce that performance; simple automated methods to generate concept sets had errors up to 10% (one outlier at 250%). DISCUSSION: The translation of data from source vocabularies to Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT) resulted in very small error rates that were an order of magnitude smaller than other error sources. CONCLUSION: It appears possible to map diagnoses from disparate vocabularies to a single clinical vocabulary and carry out research using a single set of definitions, thus improving efficiency and transportability of research. |
format | Online Article Text |
id | pubmed-6289550 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-62895502018-12-14 Effect of vocabulary mapping for conditions on phenotype cohorts Hripcsak, George Levine, Matthew E Shang, Ning Ryan, Patrick B J Am Med Inform Assoc Research and Applications OBJECTIVE: To study the effect on patient cohorts of mapping condition (diagnosis) codes from source billing vocabularies to a clinical vocabulary. MATERIALS AND METHODS: Nine International Classification of Diseases, Ninth Revision, Clinical Modification (ICD9-CM) concept sets were extracted from eMERGE network phenotypes, translated to Systematized Nomenclature of Medicine - Clinical Terms concept sets, and applied to patient data that were mapped from source ICD9-CM and ICD10-CM codes to Systematized Nomenclature of Medicine - Clinical Terms codes using Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership (OMOP) vocabulary mappings. The original ICD9-CM concept set and a concept set extended to ICD10-CM were used to create patient cohorts that served as gold standards. RESULTS: Four phenotype concept sets were able to be translated to Systematized Nomenclature of Medicine - Clinical Terms without ambiguities and were able to perform perfectly with respect to the gold standards. The other 5 lost performance when 2 or more ICD9-CM or ICD10-CM codes mapped to the same Systematized Nomenclature of Medicine - Clinical Terms code. The patient cohorts had a total error (false positive and false negative) of up to 0.15% compared to querying ICD9-CM source data and up to 0.26% compared to querying ICD9-CM and ICD10-CM data. Knowledge engineering was required to produce that performance; simple automated methods to generate concept sets had errors up to 10% (one outlier at 250%). DISCUSSION: The translation of data from source vocabularies to Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT) resulted in very small error rates that were an order of magnitude smaller than other error sources. CONCLUSION: It appears possible to map diagnoses from disparate vocabularies to a single clinical vocabulary and carry out research using a single set of definitions, thus improving efficiency and transportability of research. Oxford University Press 2018-11-03 /pmc/articles/PMC6289550/ /pubmed/30395248 http://dx.doi.org/10.1093/jamia/ocy124 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Research and Applications Hripcsak, George Levine, Matthew E Shang, Ning Ryan, Patrick B Effect of vocabulary mapping for conditions on phenotype cohorts |
title | Effect of vocabulary mapping for conditions on phenotype cohorts |
title_full | Effect of vocabulary mapping for conditions on phenotype cohorts |
title_fullStr | Effect of vocabulary mapping for conditions on phenotype cohorts |
title_full_unstemmed | Effect of vocabulary mapping for conditions on phenotype cohorts |
title_short | Effect of vocabulary mapping for conditions on phenotype cohorts |
title_sort | effect of vocabulary mapping for conditions on phenotype cohorts |
topic | Research and Applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6289550/ https://www.ncbi.nlm.nih.gov/pubmed/30395248 http://dx.doi.org/10.1093/jamia/ocy124 |
work_keys_str_mv | AT hripcsakgeorge effectofvocabularymappingforconditionsonphenotypecohorts AT levinematthewe effectofvocabularymappingforconditionsonphenotypecohorts AT shangning effectofvocabularymappingforconditionsonphenotypecohorts AT ryanpatrickb effectofvocabularymappingforconditionsonphenotypecohorts |