Cargando…
The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria
Identifying cohorts of patients based on eligibility criteria such as medical conditions, procedures, and medication use is critical to recruitment for clinical trials. Such criteria are often most naturally described in free-text, using language familiar to clinicians and researchers. In order to i...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9372145/ https://www.ncbi.nlm.nih.gov/pubmed/35953524 http://dx.doi.org/10.1038/s41597-022-01521-0 |
_version_ | 1784767316224901120 |
---|---|
author | Dobbins, Nicholas J. Mullen, Tony Uzuner, Özlem Yetisgen, Meliha |
author_facet | Dobbins, Nicholas J. Mullen, Tony Uzuner, Özlem Yetisgen, Meliha |
author_sort | Dobbins, Nicholas J. |
collection | PubMed |
description | Identifying cohorts of patients based on eligibility criteria such as medical conditions, procedures, and medication use is critical to recruitment for clinical trials. Such criteria are often most naturally described in free-text, using language familiar to clinicians and researchers. In order to identify potential participants at scale, these criteria must first be translated into queries on clinical databases, which can be labor-intensive and error-prone. Natural language processing (NLP) methods offer a potential means of such conversion into database queries automatically. However they must first be trained and evaluated using corpora which capture clinical trials criteria in sufficient detail. In this paper, we introduce the Leaf Clinical Trials (LCT) corpus, a human-annotated corpus of over 1,000 clinical trial eligibility criteria descriptions using highly granular structured labels capturing a range of biomedical phenomena. We provide details of our schema, annotation process, corpus quality, and statistics. Additionally, we present baseline information extraction results on this corpus as benchmarks for future work. |
format | Online Article Text |
id | pubmed-9372145 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-93721452022-08-13 The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria Dobbins, Nicholas J. Mullen, Tony Uzuner, Özlem Yetisgen, Meliha Sci Data Data Descriptor Identifying cohorts of patients based on eligibility criteria such as medical conditions, procedures, and medication use is critical to recruitment for clinical trials. Such criteria are often most naturally described in free-text, using language familiar to clinicians and researchers. In order to identify potential participants at scale, these criteria must first be translated into queries on clinical databases, which can be labor-intensive and error-prone. Natural language processing (NLP) methods offer a potential means of such conversion into database queries automatically. However they must first be trained and evaluated using corpora which capture clinical trials criteria in sufficient detail. In this paper, we introduce the Leaf Clinical Trials (LCT) corpus, a human-annotated corpus of over 1,000 clinical trial eligibility criteria descriptions using highly granular structured labels capturing a range of biomedical phenomena. We provide details of our schema, annotation process, corpus quality, and statistics. Additionally, we present baseline information extraction results on this corpus as benchmarks for future work. Nature Publishing Group UK 2022-08-11 /pmc/articles/PMC9372145/ /pubmed/35953524 http://dx.doi.org/10.1038/s41597-022-01521-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Data Descriptor Dobbins, Nicholas J. Mullen, Tony Uzuner, Özlem Yetisgen, Meliha The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria |
title | The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria |
title_full | The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria |
title_fullStr | The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria |
title_full_unstemmed | The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria |
title_short | The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria |
title_sort | leaf clinical trials corpus: a new resource for query generation from clinical trial eligibility criteria |
topic | Data Descriptor |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9372145/ https://www.ncbi.nlm.nih.gov/pubmed/35953524 http://dx.doi.org/10.1038/s41597-022-01521-0 |
work_keys_str_mv | AT dobbinsnicholasj theleafclinicaltrialscorpusanewresourceforquerygenerationfromclinicaltrialeligibilitycriteria AT mullentony theleafclinicaltrialscorpusanewresourceforquerygenerationfromclinicaltrialeligibilitycriteria AT uzunerozlem theleafclinicaltrialscorpusanewresourceforquerygenerationfromclinicaltrialeligibilitycriteria AT yetisgenmeliha theleafclinicaltrialscorpusanewresourceforquerygenerationfromclinicaltrialeligibilitycriteria AT dobbinsnicholasj leafclinicaltrialscorpusanewresourceforquerygenerationfromclinicaltrialeligibilitycriteria AT mullentony leafclinicaltrialscorpusanewresourceforquerygenerationfromclinicaltrialeligibilitycriteria AT uzunerozlem leafclinicaltrialscorpusanewresourceforquerygenerationfromclinicaltrialeligibilitycriteria AT yetisgenmeliha leafclinicaltrialscorpusanewresourceforquerygenerationfromclinicaltrialeligibilitycriteria |