Cargando…

Medical concept normalization in clinical trials with drug and disease representation learning

MOTIVATION: Clinical trials are the essential stage of every drug development program for the treatment to become available to patients. Despite the importance of well-structured clinical trial databases and their tremendous value for drug discovery and development such instances are very rare. Pres...

Descripción completa

Detalles Bibliográficos
Autores principales: Miftahutdinov, Zulfat, Kadurin, Artur, Kudrin, Roman, Tutubalina, Elena
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8570806/
https://www.ncbi.nlm.nih.gov/pubmed/34213526
http://dx.doi.org/10.1093/bioinformatics/btab474
_version_ 1784594894914846720
author Miftahutdinov, Zulfat
Kadurin, Artur
Kudrin, Roman
Tutubalina, Elena
author_facet Miftahutdinov, Zulfat
Kadurin, Artur
Kudrin, Roman
Tutubalina, Elena
author_sort Miftahutdinov, Zulfat
collection PubMed
description MOTIVATION: Clinical trials are the essential stage of every drug development program for the treatment to become available to patients. Despite the importance of well-structured clinical trial databases and their tremendous value for drug discovery and development such instances are very rare. Presently large-scale information on clinical trials is stored in clinical trial registers which are relatively structured, but the mappings to external databases of drugs and diseases are increasingly lacking. The precise production of such links would enable us to interrogate richer harmonized datasets for invaluable insights. RESULTS: We present a neural approach for medical concept normalization of diseases and drugs. Our two-stage approach is based on Bidirectional Encoder Representations from Transformers (BERT). In the training stage, we optimize the relative similarity of mentions and concept names from a terminology via triplet loss. In the inference stage, we obtain the closest concept name representation in a common embedding space to a given mention representation. We performed a set of experiments on a dataset of abstracts and a real-world dataset of trial records with interventions and conditions mapped to drug and disease terminologies. The latter includes mentions associated with one or more concepts (in-KB) or zero (out-of-KB, nil prediction). Experiments show that our approach significantly outperforms baseline and state-of-the-art architectures. Moreover, we demonstrate that our approach is effective in knowledge transfer from the scientific literature to clinical trial data. AVAILABILITY AND IMPLEMENTATION: We make code and data freely available at https://github.com/insilicomedicine/DILBERT.
format Online
Article
Text
id pubmed-8570806
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-85708062021-11-08 Medical concept normalization in clinical trials with drug and disease representation learning Miftahutdinov, Zulfat Kadurin, Artur Kudrin, Roman Tutubalina, Elena Bioinformatics Original Papers MOTIVATION: Clinical trials are the essential stage of every drug development program for the treatment to become available to patients. Despite the importance of well-structured clinical trial databases and their tremendous value for drug discovery and development such instances are very rare. Presently large-scale information on clinical trials is stored in clinical trial registers which are relatively structured, but the mappings to external databases of drugs and diseases are increasingly lacking. The precise production of such links would enable us to interrogate richer harmonized datasets for invaluable insights. RESULTS: We present a neural approach for medical concept normalization of diseases and drugs. Our two-stage approach is based on Bidirectional Encoder Representations from Transformers (BERT). In the training stage, we optimize the relative similarity of mentions and concept names from a terminology via triplet loss. In the inference stage, we obtain the closest concept name representation in a common embedding space to a given mention representation. We performed a set of experiments on a dataset of abstracts and a real-world dataset of trial records with interventions and conditions mapped to drug and disease terminologies. The latter includes mentions associated with one or more concepts (in-KB) or zero (out-of-KB, nil prediction). Experiments show that our approach significantly outperforms baseline and state-of-the-art architectures. Moreover, we demonstrate that our approach is effective in knowledge transfer from the scientific literature to clinical trial data. AVAILABILITY AND IMPLEMENTATION: We make code and data freely available at https://github.com/insilicomedicine/DILBERT. Oxford University Press 2021-07-02 /pmc/articles/PMC8570806/ /pubmed/34213526 http://dx.doi.org/10.1093/bioinformatics/btab474 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Miftahutdinov, Zulfat
Kadurin, Artur
Kudrin, Roman
Tutubalina, Elena
Medical concept normalization in clinical trials with drug and disease representation learning
title Medical concept normalization in clinical trials with drug and disease representation learning
title_full Medical concept normalization in clinical trials with drug and disease representation learning
title_fullStr Medical concept normalization in clinical trials with drug and disease representation learning
title_full_unstemmed Medical concept normalization in clinical trials with drug and disease representation learning
title_short Medical concept normalization in clinical trials with drug and disease representation learning
title_sort medical concept normalization in clinical trials with drug and disease representation learning
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8570806/
https://www.ncbi.nlm.nih.gov/pubmed/34213526
http://dx.doi.org/10.1093/bioinformatics/btab474
work_keys_str_mv AT miftahutdinovzulfat medicalconceptnormalizationinclinicaltrialswithdruganddiseaserepresentationlearning
AT kadurinartur medicalconceptnormalizationinclinicaltrialswithdruganddiseaserepresentationlearning
AT kudrinroman medicalconceptnormalizationinclinicaltrialswithdruganddiseaserepresentationlearning
AT tutubalinaelena medicalconceptnormalizationinclinicaltrialswithdruganddiseaserepresentationlearning