Cargando…

Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation

BACKGROUND: Multiple types of biomedical associations of knowledge graphs, including COVID-19–related ones, are constructed based on co-occurring biomedical entities retrieved from recent literature. However, the applications derived from these raw graphs (eg, association predictions among genes, dr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jiang, Chao, Ngo, Victoria, Chapman, Richard, Yu, Yue, Liu, Hongfang, Jiang, Guoqian, Zong, Nansu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2022
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9301549/ https://www.ncbi.nlm.nih.gov/pubmed/35658098 http://dx.doi.org/10.2196/38584

_version_	1784751443156140032
author	Jiang, Chao Ngo, Victoria Chapman, Richard Yu, Yue Liu, Hongfang Jiang, Guoqian Zong, Nansu
author_facet	Jiang, Chao Ngo, Victoria Chapman, Richard Yu, Yue Liu, Hongfang Jiang, Guoqian Zong, Nansu
author_sort	Jiang, Chao
collection	PubMed
description	BACKGROUND: Multiple types of biomedical associations of knowledge graphs, including COVID-19–related ones, are constructed based on co-occurring biomedical entities retrieved from recent literature. However, the applications derived from these raw graphs (eg, association predictions among genes, drugs, and diseases) have a high probability of false-positive predictions as co-occurrences in the literature do not always mean there is a true biomedical association between two entities. OBJECTIVE: Data quality plays an important role in training deep neural network models; however, most of the current work in this area has been focused on improving a model’s performance with the assumption that the preprocessed data are clean. Here, we studied how to remove noise from raw knowledge graphs with limited labeled information. METHODS: The proposed framework used generative-based deep neural networks to generate a graph that can distinguish the unknown associations in the raw training graph. Two generative adversarial network models, NetGAN and Cross-Entropy Low-rank Logits (CELL), were adopted for the edge classification (ie, link prediction), leveraging unlabeled link information based on a real knowledge graph built from LitCovid and Pubtator. RESULTS: The performance of link prediction, especially in the extreme case of training data versus test data at a ratio of 1:9, demonstrated that the proposed method still achieved favorable results (area under the receiver operating characteristic curve >0.8 for the synthetic data set and 0.7 for the real data set), despite the limited amount of testing data available. CONCLUSIONS: Our preliminary findings showed the proposed framework achieved promising results for removing noise during data preprocessing of the biomedical knowledge graph, potentially improving the performance of downstream applications by providing cleaner data.
format	Online Article Text
id	pubmed-9301549
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-93015492022-07-22 Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation Jiang, Chao Ngo, Victoria Chapman, Richard Yu, Yue Liu, Hongfang Jiang, Guoqian Zong, Nansu J Med Internet Res Original Paper BACKGROUND: Multiple types of biomedical associations of knowledge graphs, including COVID-19–related ones, are constructed based on co-occurring biomedical entities retrieved from recent literature. However, the applications derived from these raw graphs (eg, association predictions among genes, drugs, and diseases) have a high probability of false-positive predictions as co-occurrences in the literature do not always mean there is a true biomedical association between two entities. OBJECTIVE: Data quality plays an important role in training deep neural network models; however, most of the current work in this area has been focused on improving a model’s performance with the assumption that the preprocessed data are clean. Here, we studied how to remove noise from raw knowledge graphs with limited labeled information. METHODS: The proposed framework used generative-based deep neural networks to generate a graph that can distinguish the unknown associations in the raw training graph. Two generative adversarial network models, NetGAN and Cross-Entropy Low-rank Logits (CELL), were adopted for the edge classification (ie, link prediction), leveraging unlabeled link information based on a real knowledge graph built from LitCovid and Pubtator. RESULTS: The performance of link prediction, especially in the extreme case of training data versus test data at a ratio of 1:9, demonstrated that the proposed method still achieved favorable results (area under the receiver operating characteristic curve >0.8 for the synthetic data set and 0.7 for the real data set), despite the limited amount of testing data available. CONCLUSIONS: Our preliminary findings showed the proposed framework achieved promising results for removing noise during data preprocessing of the biomedical knowledge graph, potentially improving the performance of downstream applications by providing cleaner data. JMIR Publications 2022-07-06 /pmc/articles/PMC9301549/ /pubmed/35658098 http://dx.doi.org/10.2196/38584 Text en ©Chao Jiang, Victoria Ngo, Richard Chapman, Yue Yu, Hongfang Liu, Guoqian Jiang, Nansu Zong. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 06.07.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Jiang, Chao Ngo, Victoria Chapman, Richard Yu, Yue Liu, Hongfang Jiang, Guoqian Zong, Nansu Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation
title	Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation
title_full	Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation
title_fullStr	Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation
title_full_unstemmed	Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation
title_short	Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation
title_sort	deep denoising of raw biomedical knowledge graph from covid-19 literature, litcovid, and pubtator: framework development and validation
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9301549/ https://www.ncbi.nlm.nih.gov/pubmed/35658098 http://dx.doi.org/10.2196/38584
work_keys_str_mv	AT jiangchao deepdenoisingofrawbiomedicalknowledgegraphfromcovid19literaturelitcovidandpubtatorframeworkdevelopmentandvalidation AT ngovictoria deepdenoisingofrawbiomedicalknowledgegraphfromcovid19literaturelitcovidandpubtatorframeworkdevelopmentandvalidation AT chapmanrichard deepdenoisingofrawbiomedicalknowledgegraphfromcovid19literaturelitcovidandpubtatorframeworkdevelopmentandvalidation AT yuyue deepdenoisingofrawbiomedicalknowledgegraphfromcovid19literaturelitcovidandpubtatorframeworkdevelopmentandvalidation AT liuhongfang deepdenoisingofrawbiomedicalknowledgegraphfromcovid19literaturelitcovidandpubtatorframeworkdevelopmentandvalidation AT jiangguoqian deepdenoisingofrawbiomedicalknowledgegraphfromcovid19literaturelitcovidandpubtatorframeworkdevelopmentandvalidation AT zongnansu deepdenoisingofrawbiomedicalknowledgegraphfromcovid19literaturelitcovidandpubtatorframeworkdevelopmentandvalidation

Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation

Ejemplares similares