Cargando…
Building an OMOP common data model-compliant annotated corpus for COVID-19 clinical trials
Clinical trials are essential for generating reliable medical evidence, but often suffer from expensive and delayed patient recruitment because the unstructured eligibility criteria description prevents automatic query generation for eligibility screening. In response to the COVID-19 pandemic, many...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier Inc.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8079156/ https://www.ncbi.nlm.nih.gov/pubmed/33887457 http://dx.doi.org/10.1016/j.jbi.2021.103790 |
_version_ | 1783685164257247232 |
---|---|
author | Sun, Yingcheng Butler, Alex Stewart, Latoya A. Liu, Hao Yuan, Chi Southard, Christopher T. Kim, Jae Hyun Weng, Chunhua |
author_facet | Sun, Yingcheng Butler, Alex Stewart, Latoya A. Liu, Hao Yuan, Chi Southard, Christopher T. Kim, Jae Hyun Weng, Chunhua |
author_sort | Sun, Yingcheng |
collection | PubMed |
description | Clinical trials are essential for generating reliable medical evidence, but often suffer from expensive and delayed patient recruitment because the unstructured eligibility criteria description prevents automatic query generation for eligibility screening. In response to the COVID-19 pandemic, many trials have been created but their information is not computable. We included 700 COVID-19 trials available at the point of study and developed a semi-automatic approach to generate an annotated corpus for COVID-19 clinical trial eligibility criteria called COVIC. A hierarchical annotation schema based on the OMOP Common Data Model was developed to accommodate four levels of annotation granularity: i.e., study cohort, eligibility criteria, named entity and standard concept. In COVIC, 39 trials with more than one study cohorts were identified and labelled with an identifier for each cohort. 1,943 criteria for non-clinical characteristics such as “informed consent”, “exclusivity of participation” were annotated. 9767 criteria were represented by 18,161 entities in 8 domains, 7,743 attributes of 7 attribute types and 16,443 relationships of 11 relationship types. 17,171 entities were mapped to standard medical concepts and 1,009 attributes were normalized into computable representations. COVIC can serve as a corpus indexed by semantic tags for COVID-19 trial search and analytics, and a benchmark for machine learning based criteria extraction. |
format | Online Article Text |
id | pubmed-8079156 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Elsevier Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-80791562021-04-28 Building an OMOP common data model-compliant annotated corpus for COVID-19 clinical trials Sun, Yingcheng Butler, Alex Stewart, Latoya A. Liu, Hao Yuan, Chi Southard, Christopher T. Kim, Jae Hyun Weng, Chunhua J Biomed Inform Original Research Clinical trials are essential for generating reliable medical evidence, but often suffer from expensive and delayed patient recruitment because the unstructured eligibility criteria description prevents automatic query generation for eligibility screening. In response to the COVID-19 pandemic, many trials have been created but their information is not computable. We included 700 COVID-19 trials available at the point of study and developed a semi-automatic approach to generate an annotated corpus for COVID-19 clinical trial eligibility criteria called COVIC. A hierarchical annotation schema based on the OMOP Common Data Model was developed to accommodate four levels of annotation granularity: i.e., study cohort, eligibility criteria, named entity and standard concept. In COVIC, 39 trials with more than one study cohorts were identified and labelled with an identifier for each cohort. 1,943 criteria for non-clinical characteristics such as “informed consent”, “exclusivity of participation” were annotated. 9767 criteria were represented by 18,161 entities in 8 domains, 7,743 attributes of 7 attribute types and 16,443 relationships of 11 relationship types. 17,171 entities were mapped to standard medical concepts and 1,009 attributes were normalized into computable representations. COVIC can serve as a corpus indexed by semantic tags for COVID-19 trial search and analytics, and a benchmark for machine learning based criteria extraction. Elsevier Inc. 2021-06 2021-04-28 /pmc/articles/PMC8079156/ /pubmed/33887457 http://dx.doi.org/10.1016/j.jbi.2021.103790 Text en © 2021 Elsevier Inc. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active. |
spellingShingle | Original Research Sun, Yingcheng Butler, Alex Stewart, Latoya A. Liu, Hao Yuan, Chi Southard, Christopher T. Kim, Jae Hyun Weng, Chunhua Building an OMOP common data model-compliant annotated corpus for COVID-19 clinical trials |
title | Building an OMOP common data model-compliant annotated corpus for COVID-19 clinical trials |
title_full | Building an OMOP common data model-compliant annotated corpus for COVID-19 clinical trials |
title_fullStr | Building an OMOP common data model-compliant annotated corpus for COVID-19 clinical trials |
title_full_unstemmed | Building an OMOP common data model-compliant annotated corpus for COVID-19 clinical trials |
title_short | Building an OMOP common data model-compliant annotated corpus for COVID-19 clinical trials |
title_sort | building an omop common data model-compliant annotated corpus for covid-19 clinical trials |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8079156/ https://www.ncbi.nlm.nih.gov/pubmed/33887457 http://dx.doi.org/10.1016/j.jbi.2021.103790 |
work_keys_str_mv | AT sunyingcheng buildinganomopcommondatamodelcompliantannotatedcorpusforcovid19clinicaltrials AT butleralex buildinganomopcommondatamodelcompliantannotatedcorpusforcovid19clinicaltrials AT stewartlatoyaa buildinganomopcommondatamodelcompliantannotatedcorpusforcovid19clinicaltrials AT liuhao buildinganomopcommondatamodelcompliantannotatedcorpusforcovid19clinicaltrials AT yuanchi buildinganomopcommondatamodelcompliantannotatedcorpusforcovid19clinicaltrials AT southardchristophert buildinganomopcommondatamodelcompliantannotatedcorpusforcovid19clinicaltrials AT kimjaehyun buildinganomopcommondatamodelcompliantannotatedcorpusforcovid19clinicaltrials AT wengchunhua buildinganomopcommondatamodelcompliantannotatedcorpusforcovid19clinicaltrials |