Cargando…

Chia, a large annotated corpus of clinical trial eligibility criteria

We present Chia, a novel, large annotated corpus of patient eligibility criteria extracted from 1,000 interventional, Phase IV clinical trials registered in ClinicalTrials.gov. This dataset includes 12,409 annotated eligibility criteria, represented by 41,487 distinctive entities of 15 entity types...

Descripción completa

Detalles Bibliográficos
Autores principales: Kury, Fabrício, Butler, Alex, Yuan, Chi, Fu, Li-heng, Sun, Yingcheng, Liu, Hao, Sim, Ida, Carini, Simona, Weng, Chunhua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7452886/
https://www.ncbi.nlm.nih.gov/pubmed/32855408
http://dx.doi.org/10.1038/s41597-020-00620-0
_version_ 1783575249614274560
author Kury, Fabrício
Butler, Alex
Yuan, Chi
Fu, Li-heng
Sun, Yingcheng
Liu, Hao
Sim, Ida
Carini, Simona
Weng, Chunhua
author_facet Kury, Fabrício
Butler, Alex
Yuan, Chi
Fu, Li-heng
Sun, Yingcheng
Liu, Hao
Sim, Ida
Carini, Simona
Weng, Chunhua
author_sort Kury, Fabrício
collection PubMed
description We present Chia, a novel, large annotated corpus of patient eligibility criteria extracted from 1,000 interventional, Phase IV clinical trials registered in ClinicalTrials.gov. This dataset includes 12,409 annotated eligibility criteria, represented by 41,487 distinctive entities of 15 entity types and 25,017 relationships of 12 relationship types. Each criterion is represented as a directed acyclic graph, which can be easily transformed into Boolean logic to form a database query. Chia can serve as a shared benchmark to develop and test future machine learning, rule-based, or hybrid methods for information extraction from free-text clinical trial eligibility criteria.
format Online
Article
Text
id pubmed-7452886
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-74528862020-09-02 Chia, a large annotated corpus of clinical trial eligibility criteria Kury, Fabrício Butler, Alex Yuan, Chi Fu, Li-heng Sun, Yingcheng Liu, Hao Sim, Ida Carini, Simona Weng, Chunhua Sci Data Data Descriptor We present Chia, a novel, large annotated corpus of patient eligibility criteria extracted from 1,000 interventional, Phase IV clinical trials registered in ClinicalTrials.gov. This dataset includes 12,409 annotated eligibility criteria, represented by 41,487 distinctive entities of 15 entity types and 25,017 relationships of 12 relationship types. Each criterion is represented as a directed acyclic graph, which can be easily transformed into Boolean logic to form a database query. Chia can serve as a shared benchmark to develop and test future machine learning, rule-based, or hybrid methods for information extraction from free-text clinical trial eligibility criteria. Nature Publishing Group UK 2020-08-27 /pmc/articles/PMC7452886/ /pubmed/32855408 http://dx.doi.org/10.1038/s41597-020-00620-0 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.
spellingShingle Data Descriptor
Kury, Fabrício
Butler, Alex
Yuan, Chi
Fu, Li-heng
Sun, Yingcheng
Liu, Hao
Sim, Ida
Carini, Simona
Weng, Chunhua
Chia, a large annotated corpus of clinical trial eligibility criteria
title Chia, a large annotated corpus of clinical trial eligibility criteria
title_full Chia, a large annotated corpus of clinical trial eligibility criteria
title_fullStr Chia, a large annotated corpus of clinical trial eligibility criteria
title_full_unstemmed Chia, a large annotated corpus of clinical trial eligibility criteria
title_short Chia, a large annotated corpus of clinical trial eligibility criteria
title_sort chia, a large annotated corpus of clinical trial eligibility criteria
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7452886/
https://www.ncbi.nlm.nih.gov/pubmed/32855408
http://dx.doi.org/10.1038/s41597-020-00620-0
work_keys_str_mv AT kuryfabricio chiaalargeannotatedcorpusofclinicaltrialeligibilitycriteria
AT butleralex chiaalargeannotatedcorpusofclinicaltrialeligibilitycriteria
AT yuanchi chiaalargeannotatedcorpusofclinicaltrialeligibilitycriteria
AT fuliheng chiaalargeannotatedcorpusofclinicaltrialeligibilitycriteria
AT sunyingcheng chiaalargeannotatedcorpusofclinicaltrialeligibilitycriteria
AT liuhao chiaalargeannotatedcorpusofclinicaltrialeligibilitycriteria
AT simida chiaalargeannotatedcorpusofclinicaltrialeligibilitycriteria
AT carinisimona chiaalargeannotatedcorpusofclinicaltrialeligibilitycriteria
AT wengchunhua chiaalargeannotatedcorpusofclinicaltrialeligibilitycriteria