Cargando…

Improved prediction of drug-induced liver injury literature using natural language processing and machine learning methods

Drug-induced liver injury (DILI) is an adverse hepatic drug reaction that can potentially lead to life-threatening liver failure. Previously published work in the scientific literature on DILI has provided valuable insights for the understanding of hepatotoxicity as well as drug development. However...

Descripción completa

Detalles Bibliográficos
Autores principales:	Oh, Jung Hun, Tannenbaum, Allen, Deasy, Joseph O.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390074/ https://www.ncbi.nlm.nih.gov/pubmed/37529777 http://dx.doi.org/10.3389/fgene.2023.1161047

_version_	1785082399956140032
author	Oh, Jung Hun Tannenbaum, Allen Deasy, Joseph O.
author_facet	Oh, Jung Hun Tannenbaum, Allen Deasy, Joseph O.
author_sort	Oh, Jung Hun
collection	PubMed
description	Drug-induced liver injury (DILI) is an adverse hepatic drug reaction that can potentially lead to life-threatening liver failure. Previously published work in the scientific literature on DILI has provided valuable insights for the understanding of hepatotoxicity as well as drug development. However, the manual search of scientific literature in PubMed is laborious and time-consuming. Natural language processing (NLP) techniques along with artificial intelligence/machine learning approaches may allow for automatic processing in identifying DILI-related literature, but useful methods are yet to be demonstrated. To address this issue, we have developed an integrated NLP/machine learning classification model to identify DILI-related literature using only paper titles and abstracts. For prediction modeling, we used 14,203 publications provided by the Critical Assessment of Massive Data Analysis (CAMDA) challenge, employing word vectorization techniques in NLP in conjunction with machine learning methods. Classification modeling was performed using 2/3 of the data for training and the remainder for test in internal validation. The best performance was achieved using a linear support vector machine (SVM) model on the combined vectors derived from term frequency-inverse document frequency (TF-IDF) and Word2Vec, resulting in an accuracy of 95.0% and an F1-score of 95.0%. The final SVM model constructed from all 14,203 publications was tested on independent datasets, resulting in accuracies of 92.5%, 96.3%, and 98.3%, and F1-scores of 93.5%, 86.1%, and 75.6% for three test sets (T1-T3). Furthermore, the SVM model was tested on four external validation sets (V1-V4), resulting in accuracies of 92.0%, 96.2%, 98.3%, and 93.1%, and F1-scores of 92.4%, 82.9%, 75.0%, and 93.3%.
format	Online Article Text
id	pubmed-10390074
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-103900742023-08-01 Improved prediction of drug-induced liver injury literature using natural language processing and machine learning methods Oh, Jung Hun Tannenbaum, Allen Deasy, Joseph O. Front Genet Genetics Drug-induced liver injury (DILI) is an adverse hepatic drug reaction that can potentially lead to life-threatening liver failure. Previously published work in the scientific literature on DILI has provided valuable insights for the understanding of hepatotoxicity as well as drug development. However, the manual search of scientific literature in PubMed is laborious and time-consuming. Natural language processing (NLP) techniques along with artificial intelligence/machine learning approaches may allow for automatic processing in identifying DILI-related literature, but useful methods are yet to be demonstrated. To address this issue, we have developed an integrated NLP/machine learning classification model to identify DILI-related literature using only paper titles and abstracts. For prediction modeling, we used 14,203 publications provided by the Critical Assessment of Massive Data Analysis (CAMDA) challenge, employing word vectorization techniques in NLP in conjunction with machine learning methods. Classification modeling was performed using 2/3 of the data for training and the remainder for test in internal validation. The best performance was achieved using a linear support vector machine (SVM) model on the combined vectors derived from term frequency-inverse document frequency (TF-IDF) and Word2Vec, resulting in an accuracy of 95.0% and an F1-score of 95.0%. The final SVM model constructed from all 14,203 publications was tested on independent datasets, resulting in accuracies of 92.5%, 96.3%, and 98.3%, and F1-scores of 93.5%, 86.1%, and 75.6% for three test sets (T1-T3). Furthermore, the SVM model was tested on four external validation sets (V1-V4), resulting in accuracies of 92.0%, 96.2%, 98.3%, and 93.1%, and F1-scores of 92.4%, 82.9%, 75.0%, and 93.3%. Frontiers Media S.A. 2023-07-17 /pmc/articles/PMC10390074/ /pubmed/37529777 http://dx.doi.org/10.3389/fgene.2023.1161047 Text en Copyright © 2023 Oh, Tannenbaum and Deasy. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Oh, Jung Hun Tannenbaum, Allen Deasy, Joseph O. Improved prediction of drug-induced liver injury literature using natural language processing and machine learning methods
title	Improved prediction of drug-induced liver injury literature using natural language processing and machine learning methods
title_full	Improved prediction of drug-induced liver injury literature using natural language processing and machine learning methods
title_fullStr	Improved prediction of drug-induced liver injury literature using natural language processing and machine learning methods
title_full_unstemmed	Improved prediction of drug-induced liver injury literature using natural language processing and machine learning methods
title_short	Improved prediction of drug-induced liver injury literature using natural language processing and machine learning methods
title_sort	improved prediction of drug-induced liver injury literature using natural language processing and machine learning methods
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390074/ https://www.ncbi.nlm.nih.gov/pubmed/37529777 http://dx.doi.org/10.3389/fgene.2023.1161047
work_keys_str_mv	AT ohjunghun improvedpredictionofdruginducedliverinjuryliteratureusingnaturallanguageprocessingandmachinelearningmethods AT tannenbaumallen improvedpredictionofdruginducedliverinjuryliteratureusingnaturallanguageprocessingandmachinelearningmethods AT deasyjosepho improvedpredictionofdruginducedliverinjuryliteratureusingnaturallanguageprocessingandmachinelearningmethods

Improved prediction of drug-induced liver injury literature using natural language processing and machine learning methods

Ejemplares similares