Cargando…

Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning

BACKGROUND: Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zeng, Kun, Xu, Yibin, Lin, Ge, Liang, Likeng, Hao, Tianyong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8323220/ https://www.ncbi.nlm.nih.gov/pubmed/34330259 http://dx.doi.org/10.1186/s12911-021-01492-z

_version_	1783731198402494464
author	Zeng, Kun Xu, Yibin Lin, Ge Liang, Likeng Hao, Tianyong
author_facet	Zeng, Kun Xu, Yibin Lin, Ge Liang, Likeng Hao, Tianyong
author_sort	Zeng, Kun
collection	PubMed
description	BACKGROUND: Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. METHODS: An ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories. RESULTS: Our ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement. CONCLUSIONS: A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance.
format	Online Article Text
id	pubmed-8323220
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-83232202021-07-30 Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning Zeng, Kun Xu, Yibin Lin, Ge Liang, Likeng Hao, Tianyong BMC Med Inform Decis Mak Research BACKGROUND: Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. METHODS: An ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories. RESULTS: Our ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement. CONCLUSIONS: A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance. BioMed Central 2021-07-30 /pmc/articles/PMC8323220/ /pubmed/34330259 http://dx.doi.org/10.1186/s12911-021-01492-z Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Zeng, Kun Xu, Yibin Lin, Ge Liang, Likeng Hao, Tianyong Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning
title	Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning
title_full	Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning
title_fullStr	Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning
title_full_unstemmed	Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning
title_short	Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning
title_sort	automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8323220/ https://www.ncbi.nlm.nih.gov/pubmed/34330259 http://dx.doi.org/10.1186/s12911-021-01492-z
work_keys_str_mv	AT zengkun automatedclassificationofclinicaltrialeligibilitycriteriatextbasedonensemblelearningandmetriclearning AT xuyibin automatedclassificationofclinicaltrialeligibilitycriteriatextbasedonensemblelearningandmetriclearning AT linge automatedclassificationofclinicaltrialeligibilitycriteriatextbasedonensemblelearningandmetriclearning AT lianglikeng automatedclassificationofclinicaltrialeligibilitycriteriatextbasedonensemblelearningandmetriclearning AT haotianyong automatedclassificationofclinicaltrialeligibilitycriteriatextbasedonensemblelearningandmetriclearning

Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning

Ejemplares similares