Cargando…

DeepVISP: Deep Learning for Virus Site Integration Prediction and Motif Discovery

Approximately 15% of human cancers are estimated to be attributed to viruses. Virus sequences can be integrated into the host genome, leading to genomic instability and carcinogenesis. Here, a new deep convolutional neural network (CNN) model is developed with attention architecture, namely DeepVISP...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Haodong, Jia, Peilin, Zhao, Zhongming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8097320/
https://www.ncbi.nlm.nih.gov/pubmed/33977077
http://dx.doi.org/10.1002/advs.202004958
_version_ 1783688331337400320
author Xu, Haodong
Jia, Peilin
Zhao, Zhongming
author_facet Xu, Haodong
Jia, Peilin
Zhao, Zhongming
author_sort Xu, Haodong
collection PubMed
description Approximately 15% of human cancers are estimated to be attributed to viruses. Virus sequences can be integrated into the host genome, leading to genomic instability and carcinogenesis. Here, a new deep convolutional neural network (CNN) model is developed with attention architecture, namely DeepVISP, for accurately predicting oncogenic virus integration sites (VISs) in the human genome. Using the curated benchmark integration data of three viruses, hepatitis B virus (HBV), human herpesvirus (HPV), and Epstein‐Barr virus (EBV), DeepVISP achieves high accuracy and robust performance for all three viruses through automatically learning informative features and essential genomic positions only from the DNA sequences. In comparison, DeepVISP outperforms conventional machine learning methods by 8.43–34.33% measured by area under curve (AUC) value enhancement in three viruses. Moreover, DeepVISP can decode cis‐regulatory factors that are potentially involved in virus integration and tumorigenesis, such as HOXB7, IKZF1, and LHX6. These findings are supported by multiple lines of evidence in literature. The clustering analysis of the informative motifs reveales that the representative k‐mers in clusters could help guide virus recognition of the host genes. A user‐friendly web server is developed for predicting putative oncogenic VISs in the human genome using DeepVISP.
format Online
Article
Text
id pubmed-8097320
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-80973202021-05-10 DeepVISP: Deep Learning for Virus Site Integration Prediction and Motif Discovery Xu, Haodong Jia, Peilin Zhao, Zhongming Adv Sci (Weinh) Research Articles Approximately 15% of human cancers are estimated to be attributed to viruses. Virus sequences can be integrated into the host genome, leading to genomic instability and carcinogenesis. Here, a new deep convolutional neural network (CNN) model is developed with attention architecture, namely DeepVISP, for accurately predicting oncogenic virus integration sites (VISs) in the human genome. Using the curated benchmark integration data of three viruses, hepatitis B virus (HBV), human herpesvirus (HPV), and Epstein‐Barr virus (EBV), DeepVISP achieves high accuracy and robust performance for all three viruses through automatically learning informative features and essential genomic positions only from the DNA sequences. In comparison, DeepVISP outperforms conventional machine learning methods by 8.43–34.33% measured by area under curve (AUC) value enhancement in three viruses. Moreover, DeepVISP can decode cis‐regulatory factors that are potentially involved in virus integration and tumorigenesis, such as HOXB7, IKZF1, and LHX6. These findings are supported by multiple lines of evidence in literature. The clustering analysis of the informative motifs reveales that the representative k‐mers in clusters could help guide virus recognition of the host genes. A user‐friendly web server is developed for predicting putative oncogenic VISs in the human genome using DeepVISP. John Wiley and Sons Inc. 2021-03-08 /pmc/articles/PMC8097320/ /pubmed/33977077 http://dx.doi.org/10.1002/advs.202004958 Text en © 2021 The Authors. Advanced Science published by Wiley‐VCH GmbH https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Articles
Xu, Haodong
Jia, Peilin
Zhao, Zhongming
DeepVISP: Deep Learning for Virus Site Integration Prediction and Motif Discovery
title DeepVISP: Deep Learning for Virus Site Integration Prediction and Motif Discovery
title_full DeepVISP: Deep Learning for Virus Site Integration Prediction and Motif Discovery
title_fullStr DeepVISP: Deep Learning for Virus Site Integration Prediction and Motif Discovery
title_full_unstemmed DeepVISP: Deep Learning for Virus Site Integration Prediction and Motif Discovery
title_short DeepVISP: Deep Learning for Virus Site Integration Prediction and Motif Discovery
title_sort deepvisp: deep learning for virus site integration prediction and motif discovery
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8097320/
https://www.ncbi.nlm.nih.gov/pubmed/33977077
http://dx.doi.org/10.1002/advs.202004958
work_keys_str_mv AT xuhaodong deepvispdeeplearningforvirussiteintegrationpredictionandmotifdiscovery
AT jiapeilin deepvispdeeplearningforvirussiteintegrationpredictionandmotifdiscovery
AT zhaozhongming deepvispdeeplearningforvirussiteintegrationpredictionandmotifdiscovery