Cargando…
Chia, a large annotated corpus of clinical trial eligibility criteria
We present Chia, a novel, large annotated corpus of patient eligibility criteria extracted from 1,000 interventional, Phase IV clinical trials registered in ClinicalTrials.gov. This dataset includes 12,409 annotated eligibility criteria, represented by 41,487 distinctive entities of 15 entity types...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7452886/ https://www.ncbi.nlm.nih.gov/pubmed/32855408 http://dx.doi.org/10.1038/s41597-020-00620-0 |
_version_ | 1783575249614274560 |
---|---|
author | Kury, Fabrício Butler, Alex Yuan, Chi Fu, Li-heng Sun, Yingcheng Liu, Hao Sim, Ida Carini, Simona Weng, Chunhua |
author_facet | Kury, Fabrício Butler, Alex Yuan, Chi Fu, Li-heng Sun, Yingcheng Liu, Hao Sim, Ida Carini, Simona Weng, Chunhua |
author_sort | Kury, Fabrício |
collection | PubMed |
description | We present Chia, a novel, large annotated corpus of patient eligibility criteria extracted from 1,000 interventional, Phase IV clinical trials registered in ClinicalTrials.gov. This dataset includes 12,409 annotated eligibility criteria, represented by 41,487 distinctive entities of 15 entity types and 25,017 relationships of 12 relationship types. Each criterion is represented as a directed acyclic graph, which can be easily transformed into Boolean logic to form a database query. Chia can serve as a shared benchmark to develop and test future machine learning, rule-based, or hybrid methods for information extraction from free-text clinical trial eligibility criteria. |
format | Online Article Text |
id | pubmed-7452886 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-74528862020-09-02 Chia, a large annotated corpus of clinical trial eligibility criteria Kury, Fabrício Butler, Alex Yuan, Chi Fu, Li-heng Sun, Yingcheng Liu, Hao Sim, Ida Carini, Simona Weng, Chunhua Sci Data Data Descriptor We present Chia, a novel, large annotated corpus of patient eligibility criteria extracted from 1,000 interventional, Phase IV clinical trials registered in ClinicalTrials.gov. This dataset includes 12,409 annotated eligibility criteria, represented by 41,487 distinctive entities of 15 entity types and 25,017 relationships of 12 relationship types. Each criterion is represented as a directed acyclic graph, which can be easily transformed into Boolean logic to form a database query. Chia can serve as a shared benchmark to develop and test future machine learning, rule-based, or hybrid methods for information extraction from free-text clinical trial eligibility criteria. Nature Publishing Group UK 2020-08-27 /pmc/articles/PMC7452886/ /pubmed/32855408 http://dx.doi.org/10.1038/s41597-020-00620-0 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article. |
spellingShingle | Data Descriptor Kury, Fabrício Butler, Alex Yuan, Chi Fu, Li-heng Sun, Yingcheng Liu, Hao Sim, Ida Carini, Simona Weng, Chunhua Chia, a large annotated corpus of clinical trial eligibility criteria |
title | Chia, a large annotated corpus of clinical trial eligibility criteria |
title_full | Chia, a large annotated corpus of clinical trial eligibility criteria |
title_fullStr | Chia, a large annotated corpus of clinical trial eligibility criteria |
title_full_unstemmed | Chia, a large annotated corpus of clinical trial eligibility criteria |
title_short | Chia, a large annotated corpus of clinical trial eligibility criteria |
title_sort | chia, a large annotated corpus of clinical trial eligibility criteria |
topic | Data Descriptor |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7452886/ https://www.ncbi.nlm.nih.gov/pubmed/32855408 http://dx.doi.org/10.1038/s41597-020-00620-0 |
work_keys_str_mv | AT kuryfabricio chiaalargeannotatedcorpusofclinicaltrialeligibilitycriteria AT butleralex chiaalargeannotatedcorpusofclinicaltrialeligibilitycriteria AT yuanchi chiaalargeannotatedcorpusofclinicaltrialeligibilitycriteria AT fuliheng chiaalargeannotatedcorpusofclinicaltrialeligibilitycriteria AT sunyingcheng chiaalargeannotatedcorpusofclinicaltrialeligibilitycriteria AT liuhao chiaalargeannotatedcorpusofclinicaltrialeligibilitycriteria AT simida chiaalargeannotatedcorpusofclinicaltrialeligibilitycriteria AT carinisimona chiaalargeannotatedcorpusofclinicaltrialeligibilitycriteria AT wengchunhua chiaalargeannotatedcorpusofclinicaltrialeligibilitycriteria |