Cargando…

Adverse Drug Event Prediction Using Noisy Literature-Derived Knowledge Graphs: Algorithm Development and Validation

BACKGROUND: Adverse drug events (ADEs) are unintended side effects of drugs that cause substantial clinical and economic burdens globally. Not all ADEs are discovered during clinical trials; therefore, postmarketing surveillance, called pharmacovigilance, is routinely conducted to find unknown ADEs....

Descripción completa

Detalles Bibliográficos
Autores principales: Dasgupta, Soham, Jayagopal, Aishwarya, Jun Hong, Abel Lim, Mariappan, Ragunathan, Rajan, Vaibhav
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8576589/
https://www.ncbi.nlm.nih.gov/pubmed/34694230
http://dx.doi.org/10.2196/32730
_version_ 1784595907994451968
author Dasgupta, Soham
Jayagopal, Aishwarya
Jun Hong, Abel Lim
Mariappan, Ragunathan
Rajan, Vaibhav
author_facet Dasgupta, Soham
Jayagopal, Aishwarya
Jun Hong, Abel Lim
Mariappan, Ragunathan
Rajan, Vaibhav
author_sort Dasgupta, Soham
collection PubMed
description BACKGROUND: Adverse drug events (ADEs) are unintended side effects of drugs that cause substantial clinical and economic burdens globally. Not all ADEs are discovered during clinical trials; therefore, postmarketing surveillance, called pharmacovigilance, is routinely conducted to find unknown ADEs. A wealth of information, which facilitates ADE discovery, lies in the growing body of biomedical literature. Knowledge graphs (KGs) encode information from the literature, where the vertices and the edges represent clinical concepts and their relations, respectively. The scale and unstructured form of the literature necessitates the use of natural language processing (NLP) to automatically create such KGs. Previous studies have demonstrated the utility of such literature-derived KGs in ADE prediction. Through unsupervised learning of the representations (features) of clinical concepts from the KG, which are used in machine learning models, state-of-the-art results for ADE prediction were obtained on benchmark data sets. OBJECTIVE: Due to the use of NLP to infer literature-derived KGs, there is noise in the form of false positive (erroneous) and false negative (absent) nodes and edges. Previous representation learning methods do not account for such inaccuracies in the graph. NLP algorithms can quantify the confidence in their inference of extracted concepts and relations from the literature. Our hypothesis, which motivates this work, is that by using such confidence scores during representation learning, the learned embeddings would yield better features for ADE prediction models. METHODS: We developed methods to use these confidence scores on two well-known representation learning methods—DeepWalk and Translating Embeddings for Modeling Multi-relational Data (TransE)—to develop their weighted versions: Weighted DeepWalk and Weighted TransE. These methods were used to learn representations from a large literature-derived KG, the Semantic MEDLINE Database, which contains more than 93 million clinical relations. They were compared with Embedding of Semantic Predications, which, to our knowledge, is the best reported representation learning method using the Semantic MEDLINE Database with state-of-the-art results for ADE prediction. Representations learned from different methods were used (separately) as features of drugs and diseases to build classification models for ADE prediction using benchmark data sets. The methods were compared rigorously over multiple cross-validation settings. RESULTS: The weighted versions we designed were able to learn representations that yielded more accurate predictive models than the corresponding unweighted versions of both DeepWalk and TransE, as well as Embedding of Semantic Predications, in our experiments. There were performance improvements of up to 5.75% in the F(1)-score and 8.4% in the area under the receiver operating characteristic curve value, thus advancing the state of the art in ADE prediction from literature-derived KGs. CONCLUSIONS: Our classification models can be used to aid pharmacovigilance teams in detecting potentially new ADEs. Our experiments demonstrate the importance of modeling inaccuracies in the inferred KGs for representation learning.
format Online
Article
Text
id pubmed-8576589
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-85765892021-11-24 Adverse Drug Event Prediction Using Noisy Literature-Derived Knowledge Graphs: Algorithm Development and Validation Dasgupta, Soham Jayagopal, Aishwarya Jun Hong, Abel Lim Mariappan, Ragunathan Rajan, Vaibhav JMIR Med Inform Original Paper BACKGROUND: Adverse drug events (ADEs) are unintended side effects of drugs that cause substantial clinical and economic burdens globally. Not all ADEs are discovered during clinical trials; therefore, postmarketing surveillance, called pharmacovigilance, is routinely conducted to find unknown ADEs. A wealth of information, which facilitates ADE discovery, lies in the growing body of biomedical literature. Knowledge graphs (KGs) encode information from the literature, where the vertices and the edges represent clinical concepts and their relations, respectively. The scale and unstructured form of the literature necessitates the use of natural language processing (NLP) to automatically create such KGs. Previous studies have demonstrated the utility of such literature-derived KGs in ADE prediction. Through unsupervised learning of the representations (features) of clinical concepts from the KG, which are used in machine learning models, state-of-the-art results for ADE prediction were obtained on benchmark data sets. OBJECTIVE: Due to the use of NLP to infer literature-derived KGs, there is noise in the form of false positive (erroneous) and false negative (absent) nodes and edges. Previous representation learning methods do not account for such inaccuracies in the graph. NLP algorithms can quantify the confidence in their inference of extracted concepts and relations from the literature. Our hypothesis, which motivates this work, is that by using such confidence scores during representation learning, the learned embeddings would yield better features for ADE prediction models. METHODS: We developed methods to use these confidence scores on two well-known representation learning methods—DeepWalk and Translating Embeddings for Modeling Multi-relational Data (TransE)—to develop their weighted versions: Weighted DeepWalk and Weighted TransE. These methods were used to learn representations from a large literature-derived KG, the Semantic MEDLINE Database, which contains more than 93 million clinical relations. They were compared with Embedding of Semantic Predications, which, to our knowledge, is the best reported representation learning method using the Semantic MEDLINE Database with state-of-the-art results for ADE prediction. Representations learned from different methods were used (separately) as features of drugs and diseases to build classification models for ADE prediction using benchmark data sets. The methods were compared rigorously over multiple cross-validation settings. RESULTS: The weighted versions we designed were able to learn representations that yielded more accurate predictive models than the corresponding unweighted versions of both DeepWalk and TransE, as well as Embedding of Semantic Predications, in our experiments. There were performance improvements of up to 5.75% in the F(1)-score and 8.4% in the area under the receiver operating characteristic curve value, thus advancing the state of the art in ADE prediction from literature-derived KGs. CONCLUSIONS: Our classification models can be used to aid pharmacovigilance teams in detecting potentially new ADEs. Our experiments demonstrate the importance of modeling inaccuracies in the inferred KGs for representation learning. JMIR Publications 2021-10-25 /pmc/articles/PMC8576589/ /pubmed/34694230 http://dx.doi.org/10.2196/32730 Text en ©Soham Dasgupta, Aishwarya Jayagopal, Abel Lim Jun Hong, Ragunathan Mariappan, Vaibhav Rajan. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 25.10.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Dasgupta, Soham
Jayagopal, Aishwarya
Jun Hong, Abel Lim
Mariappan, Ragunathan
Rajan, Vaibhav
Adverse Drug Event Prediction Using Noisy Literature-Derived Knowledge Graphs: Algorithm Development and Validation
title Adverse Drug Event Prediction Using Noisy Literature-Derived Knowledge Graphs: Algorithm Development and Validation
title_full Adverse Drug Event Prediction Using Noisy Literature-Derived Knowledge Graphs: Algorithm Development and Validation
title_fullStr Adverse Drug Event Prediction Using Noisy Literature-Derived Knowledge Graphs: Algorithm Development and Validation
title_full_unstemmed Adverse Drug Event Prediction Using Noisy Literature-Derived Knowledge Graphs: Algorithm Development and Validation
title_short Adverse Drug Event Prediction Using Noisy Literature-Derived Knowledge Graphs: Algorithm Development and Validation
title_sort adverse drug event prediction using noisy literature-derived knowledge graphs: algorithm development and validation
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8576589/
https://www.ncbi.nlm.nih.gov/pubmed/34694230
http://dx.doi.org/10.2196/32730
work_keys_str_mv AT dasguptasoham adversedrugeventpredictionusingnoisyliteraturederivedknowledgegraphsalgorithmdevelopmentandvalidation
AT jayagopalaishwarya adversedrugeventpredictionusingnoisyliteraturederivedknowledgegraphsalgorithmdevelopmentandvalidation
AT junhongabellim adversedrugeventpredictionusingnoisyliteraturederivedknowledgegraphsalgorithmdevelopmentandvalidation
AT mariappanragunathan adversedrugeventpredictionusingnoisyliteraturederivedknowledgegraphsalgorithmdevelopmentandvalidation
AT rajanvaibhav adversedrugeventpredictionusingnoisyliteraturederivedknowledgegraphsalgorithmdevelopmentandvalidation