Cargando…

dialogi: Utilising NLP With Chemical and Disease Similarities to Drive the Identification of Drug-Induced Liver Injury Literature

Drug-Induced Liver Injury (DILI), despite its low occurrence rate, can cause severe side effects or even lead to death. Thus, it is one of the leading causes for terminating the development of new, and restricting the use of already-circulating, drugs. Moreover, its multifactorial nature, combined w...

Descripción completa

Detalles Bibliográficos
Autores principales:	Katritsis, Nicholas M., Liu, Anika, Youssef, Gehad, Rathee, Sanjay, MacMahon, Méabh, Hwang, Woochang, Wollman, Lilly, Han, Namshik
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9395939/ https://www.ncbi.nlm.nih.gov/pubmed/36017500 http://dx.doi.org/10.3389/fgene.2022.894209

_version_	1784771815353090048
author	Katritsis, Nicholas M. Liu, Anika Youssef, Gehad Rathee, Sanjay MacMahon, Méabh Hwang, Woochang Wollman, Lilly Han, Namshik
author_facet	Katritsis, Nicholas M. Liu, Anika Youssef, Gehad Rathee, Sanjay MacMahon, Méabh Hwang, Woochang Wollman, Lilly Han, Namshik
author_sort	Katritsis, Nicholas M.
collection	PubMed
description	Drug-Induced Liver Injury (DILI), despite its low occurrence rate, can cause severe side effects or even lead to death. Thus, it is one of the leading causes for terminating the development of new, and restricting the use of already-circulating, drugs. Moreover, its multifactorial nature, combined with a clinical presentation that often mimics other liver diseases, complicate the identification of DILI-related (or “positive”) literature, which remains the main medium for sourcing results from the clinical practice and experimental studies. This work–contributing to the “Literature AI for DILI Challenge” of the Critical Assessment of Massive Data Analysis (CAMDA) 2021– presents an automated pipeline for distinguishing between DILI-positive and negative publications. We used Natural Language Processing (NLP) to filter out the uninformative parts of a text, and identify and extract mentions of chemicals and diseases. We combined that information with small-molecule and disease embeddings, which are capable of capturing chemical and disease similarities, to improve classification performance. The former were directly sourced from the Chemical Checker (CC). For the latter, we collected data that encode different aspects of disease similarity from the National Library of Medicine’s (NLM) Medical Subject Headings (MeSH) thesaurus and the Comparative Toxicogenomics Database (CTD). Following a similar procedure as the one used in the CC, vector representations for diseases were learnt and evaluated. Two Neural Network (NN) classifiers were developed: a baseline model that accepts texts as input and an augmented, extended, model that also utilises chemical and disease embeddings. We trained, validated, and tested the classifiers through a Nested Cross-Validation (NCV) scheme with 10 outer and 5 inner folds. During this, the baseline and extended models performed virtually identically, with F(1)-scores of 95.04 ± 0.61% and 94.80 ± 0.41%, respectively. Upon validation on an external, withheld, dataset that is meant to assess classifier generalisability, the extended model achieved an F(1)-score of 91.14 ± 1.62%, outperforming its baseline counterpart which received a lower score of 88.30 ± 2.44%. We make further comparisons between the classifiers and discuss future improvements and directions, including utilising chemical and disease embeddings for visualisation and exploratory analysis of the DILI-positive literature.
format	Online Article Text
id	pubmed-9395939
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-93959392022-08-24 dialogi: Utilising NLP With Chemical and Disease Similarities to Drive the Identification of Drug-Induced Liver Injury Literature Katritsis, Nicholas M. Liu, Anika Youssef, Gehad Rathee, Sanjay MacMahon, Méabh Hwang, Woochang Wollman, Lilly Han, Namshik Front Genet Genetics Drug-Induced Liver Injury (DILI), despite its low occurrence rate, can cause severe side effects or even lead to death. Thus, it is one of the leading causes for terminating the development of new, and restricting the use of already-circulating, drugs. Moreover, its multifactorial nature, combined with a clinical presentation that often mimics other liver diseases, complicate the identification of DILI-related (or “positive”) literature, which remains the main medium for sourcing results from the clinical practice and experimental studies. This work–contributing to the “Literature AI for DILI Challenge” of the Critical Assessment of Massive Data Analysis (CAMDA) 2021– presents an automated pipeline for distinguishing between DILI-positive and negative publications. We used Natural Language Processing (NLP) to filter out the uninformative parts of a text, and identify and extract mentions of chemicals and diseases. We combined that information with small-molecule and disease embeddings, which are capable of capturing chemical and disease similarities, to improve classification performance. The former were directly sourced from the Chemical Checker (CC). For the latter, we collected data that encode different aspects of disease similarity from the National Library of Medicine’s (NLM) Medical Subject Headings (MeSH) thesaurus and the Comparative Toxicogenomics Database (CTD). Following a similar procedure as the one used in the CC, vector representations for diseases were learnt and evaluated. Two Neural Network (NN) classifiers were developed: a baseline model that accepts texts as input and an augmented, extended, model that also utilises chemical and disease embeddings. We trained, validated, and tested the classifiers through a Nested Cross-Validation (NCV) scheme with 10 outer and 5 inner folds. During this, the baseline and extended models performed virtually identically, with F(1)-scores of 95.04 ± 0.61% and 94.80 ± 0.41%, respectively. Upon validation on an external, withheld, dataset that is meant to assess classifier generalisability, the extended model achieved an F(1)-score of 91.14 ± 1.62%, outperforming its baseline counterpart which received a lower score of 88.30 ± 2.44%. We make further comparisons between the classifiers and discuss future improvements and directions, including utilising chemical and disease embeddings for visualisation and exploratory analysis of the DILI-positive literature. Frontiers Media S.A. 2022-08-09 /pmc/articles/PMC9395939/ /pubmed/36017500 http://dx.doi.org/10.3389/fgene.2022.894209 Text en Copyright © 2022 Katritsis, Liu, Youssef, Rathee, MacMahon, Hwang, Wollman and Han. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Katritsis, Nicholas M. Liu, Anika Youssef, Gehad Rathee, Sanjay MacMahon, Méabh Hwang, Woochang Wollman, Lilly Han, Namshik dialogi: Utilising NLP With Chemical and Disease Similarities to Drive the Identification of Drug-Induced Liver Injury Literature
title	dialogi: Utilising NLP With Chemical and Disease Similarities to Drive the Identification of Drug-Induced Liver Injury Literature
title_full	dialogi: Utilising NLP With Chemical and Disease Similarities to Drive the Identification of Drug-Induced Liver Injury Literature
title_fullStr	dialogi: Utilising NLP With Chemical and Disease Similarities to Drive the Identification of Drug-Induced Liver Injury Literature
title_full_unstemmed	dialogi: Utilising NLP With Chemical and Disease Similarities to Drive the Identification of Drug-Induced Liver Injury Literature
title_short	dialogi: Utilising NLP With Chemical and Disease Similarities to Drive the Identification of Drug-Induced Liver Injury Literature
title_sort	dialogi: utilising nlp with chemical and disease similarities to drive the identification of drug-induced liver injury literature
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9395939/ https://www.ncbi.nlm.nih.gov/pubmed/36017500 http://dx.doi.org/10.3389/fgene.2022.894209
work_keys_str_mv	AT katritsisnicholasm dialogiutilisingnlpwithchemicalanddiseasesimilaritiestodrivetheidentificationofdruginducedliverinjuryliterature AT liuanika dialogiutilisingnlpwithchemicalanddiseasesimilaritiestodrivetheidentificationofdruginducedliverinjuryliterature AT youssefgehad dialogiutilisingnlpwithchemicalanddiseasesimilaritiestodrivetheidentificationofdruginducedliverinjuryliterature AT ratheesanjay dialogiutilisingnlpwithchemicalanddiseasesimilaritiestodrivetheidentificationofdruginducedliverinjuryliterature AT macmahonmeabh dialogiutilisingnlpwithchemicalanddiseasesimilaritiestodrivetheidentificationofdruginducedliverinjuryliterature AT hwangwoochang dialogiutilisingnlpwithchemicalanddiseasesimilaritiestodrivetheidentificationofdruginducedliverinjuryliterature AT wollmanlilly dialogiutilisingnlpwithchemicalanddiseasesimilaritiestodrivetheidentificationofdruginducedliverinjuryliterature AT hannamshik dialogiutilisingnlpwithchemicalanddiseasesimilaritiestodrivetheidentificationofdruginducedliverinjuryliterature

dialogi: Utilising NLP With Chemical and Disease Similarities to Drive the Identification of Drug-Induced Liver Injury Literature

Ejemplares similares