Cargando…

Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records

Columbia Open Health Data (COHD) is a publicly accessible database of electronic health record (EHR) prevalence and co-occurrence frequencies between conditions, drugs, procedures, and demographics. COHD was derived from Columbia University Irving Medical Center’s Observational Health Data Sciences...

Descripción completa

Detalles Bibliográficos
Autores principales: Ta, Casey N., Dumontier, Michel, Hripcsak, George, Tatonetti, Nicholas P., Weng, Chunhua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6257042/
https://www.ncbi.nlm.nih.gov/pubmed/30480666
http://dx.doi.org/10.1038/sdata.2018.273
_version_ 1783374253674987520
author Ta, Casey N.
Dumontier, Michel
Hripcsak, George
Tatonetti, Nicholas P.
Weng, Chunhua
author_facet Ta, Casey N.
Dumontier, Michel
Hripcsak, George
Tatonetti, Nicholas P.
Weng, Chunhua
author_sort Ta, Casey N.
collection PubMed
description Columbia Open Health Data (COHD) is a publicly accessible database of electronic health record (EHR) prevalence and co-occurrence frequencies between conditions, drugs, procedures, and demographics. COHD was derived from Columbia University Irving Medical Center’s Observational Health Data Sciences and Informatics (OHDSI) database. The lifetime dataset, derived from all records, contains 36,578 single concepts (11,952 conditions, 12,334 drugs, and 10,816 procedures) and 32,788,901 concept pairs from 5,364,781 patients. The 5-year dataset, derived from records from 2013–2017, contains 29,964 single concepts (10,159 conditions, 10,264 drugs, and 8,270 procedures) and 15,927,195 concept pairs from 1,790,431 patients. Exclusion of rare concepts (count ≤ 10) and Poisson randomization enable data sharing by eliminating risks to patient privacy. EHR prevalences are informative of healthcare consumption rates. Analysis of co-occurrence frequencies via relative frequency analysis and observed-expected frequency ratio are informative of associations between clinical concepts, useful for biomedical research tasks such as drug repurposing and pharmacovigilance. COHD is publicly accessible through a web application-programming interface (API) and downloadable from the Figshare repository. The code is available on GitHub.
format Online
Article
Text
id pubmed-6257042
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-62570422018-11-28 Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records Ta, Casey N. Dumontier, Michel Hripcsak, George Tatonetti, Nicholas P. Weng, Chunhua Sci Data Data Descriptor Columbia Open Health Data (COHD) is a publicly accessible database of electronic health record (EHR) prevalence and co-occurrence frequencies between conditions, drugs, procedures, and demographics. COHD was derived from Columbia University Irving Medical Center’s Observational Health Data Sciences and Informatics (OHDSI) database. The lifetime dataset, derived from all records, contains 36,578 single concepts (11,952 conditions, 12,334 drugs, and 10,816 procedures) and 32,788,901 concept pairs from 5,364,781 patients. The 5-year dataset, derived from records from 2013–2017, contains 29,964 single concepts (10,159 conditions, 10,264 drugs, and 8,270 procedures) and 15,927,195 concept pairs from 1,790,431 patients. Exclusion of rare concepts (count ≤ 10) and Poisson randomization enable data sharing by eliminating risks to patient privacy. EHR prevalences are informative of healthcare consumption rates. Analysis of co-occurrence frequencies via relative frequency analysis and observed-expected frequency ratio are informative of associations between clinical concepts, useful for biomedical research tasks such as drug repurposing and pharmacovigilance. COHD is publicly accessible through a web application-programming interface (API) and downloadable from the Figshare repository. The code is available on GitHub. Nature Publishing Group 2018-11-27 /pmc/articles/PMC6257042/ /pubmed/30480666 http://dx.doi.org/10.1038/sdata.2018.273 Text en Copyright © 2018, The Author(s) http://creativecommons.org/licenses/by/4.0/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files made available in this article.
spellingShingle Data Descriptor
Ta, Casey N.
Dumontier, Michel
Hripcsak, George
Tatonetti, Nicholas P.
Weng, Chunhua
Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records
title Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records
title_full Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records
title_fullStr Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records
title_full_unstemmed Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records
title_short Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records
title_sort columbia open health data, clinical concept prevalence and co-occurrence from electronic health records
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6257042/
https://www.ncbi.nlm.nih.gov/pubmed/30480666
http://dx.doi.org/10.1038/sdata.2018.273
work_keys_str_mv AT tacaseyn columbiaopenhealthdataclinicalconceptprevalenceandcooccurrencefromelectronichealthrecords
AT dumontiermichel columbiaopenhealthdataclinicalconceptprevalenceandcooccurrencefromelectronichealthrecords
AT hripcsakgeorge columbiaopenhealthdataclinicalconceptprevalenceandcooccurrencefromelectronichealthrecords
AT tatonettinicholasp columbiaopenhealthdataclinicalconceptprevalenceandcooccurrencefromelectronichealthrecords
AT wengchunhua columbiaopenhealthdataclinicalconceptprevalenceandcooccurrencefromelectronichealthrecords