Cargando…

SurVirus: a repeat-aware virus integration caller

A significant portion of human cancers are due to viruses integrating into human genomes. Therefore, accurately predicting virus integrations can help uncover the mechanisms that lead to many devastating diseases. Virus integrations can be called by analysing second generation high-throughput sequen...

Descripción completa

Detalles Bibliográficos
Autores principales: Rajaby, Ramesh, Zhou, Yi, Meng, Yifan, Zeng, Xi, Li, Guoliang, Wu, Peng, Sung, Wing-Kin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8034624/
https://www.ncbi.nlm.nih.gov/pubmed/33444454
http://dx.doi.org/10.1093/nar/gkaa1237
_version_ 1783676569270616064
author Rajaby, Ramesh
Zhou, Yi
Meng, Yifan
Zeng, Xi
Li, Guoliang
Wu, Peng
Sung, Wing-Kin
author_facet Rajaby, Ramesh
Zhou, Yi
Meng, Yifan
Zeng, Xi
Li, Guoliang
Wu, Peng
Sung, Wing-Kin
author_sort Rajaby, Ramesh
collection PubMed
description A significant portion of human cancers are due to viruses integrating into human genomes. Therefore, accurately predicting virus integrations can help uncover the mechanisms that lead to many devastating diseases. Virus integrations can be called by analysing second generation high-throughput sequencing datasets. Unfortunately, existing methods fail to report a significant portion of integrations, while predicting a large number of false positives. We observe that the inaccuracy is caused by incorrect alignment of reads in repetitive regions. False alignments create false positives, while missing alignments create false negatives. This paper proposes SurVirus, an improved virus integration caller that corrects the alignment of reads which are crucial for the discovery of integrations. We use publicly available datasets to show that existing methods predict hundreds of thousands of false positives; SurVirus, on the other hand, is significantly more precise while it also detects many novel integrations previously missed by other tools, most of which are in repetitive regions. We validate a subset of these novel integrations, and find that the majority are correct. Using SurVirus, we find that HPV and HBV integrations are enriched in LINE and Satellite regions which had been overlooked, as well as discover recurrent HBV and HPV breakpoints in human genome-virus fusion transcripts.
format Online
Article
Text
id pubmed-8034624
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-80346242021-04-14 SurVirus: a repeat-aware virus integration caller Rajaby, Ramesh Zhou, Yi Meng, Yifan Zeng, Xi Li, Guoliang Wu, Peng Sung, Wing-Kin Nucleic Acids Res Methods Online A significant portion of human cancers are due to viruses integrating into human genomes. Therefore, accurately predicting virus integrations can help uncover the mechanisms that lead to many devastating diseases. Virus integrations can be called by analysing second generation high-throughput sequencing datasets. Unfortunately, existing methods fail to report a significant portion of integrations, while predicting a large number of false positives. We observe that the inaccuracy is caused by incorrect alignment of reads in repetitive regions. False alignments create false positives, while missing alignments create false negatives. This paper proposes SurVirus, an improved virus integration caller that corrects the alignment of reads which are crucial for the discovery of integrations. We use publicly available datasets to show that existing methods predict hundreds of thousands of false positives; SurVirus, on the other hand, is significantly more precise while it also detects many novel integrations previously missed by other tools, most of which are in repetitive regions. We validate a subset of these novel integrations, and find that the majority are correct. Using SurVirus, we find that HPV and HBV integrations are enriched in LINE and Satellite regions which had been overlooked, as well as discover recurrent HBV and HPV breakpoints in human genome-virus fusion transcripts. Oxford University Press 2021-01-14 /pmc/articles/PMC8034624/ /pubmed/33444454 http://dx.doi.org/10.1093/nar/gkaa1237 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Rajaby, Ramesh
Zhou, Yi
Meng, Yifan
Zeng, Xi
Li, Guoliang
Wu, Peng
Sung, Wing-Kin
SurVirus: a repeat-aware virus integration caller
title SurVirus: a repeat-aware virus integration caller
title_full SurVirus: a repeat-aware virus integration caller
title_fullStr SurVirus: a repeat-aware virus integration caller
title_full_unstemmed SurVirus: a repeat-aware virus integration caller
title_short SurVirus: a repeat-aware virus integration caller
title_sort survirus: a repeat-aware virus integration caller
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8034624/
https://www.ncbi.nlm.nih.gov/pubmed/33444454
http://dx.doi.org/10.1093/nar/gkaa1237
work_keys_str_mv AT rajabyramesh survirusarepeatawarevirusintegrationcaller
AT zhouyi survirusarepeatawarevirusintegrationcaller
AT mengyifan survirusarepeatawarevirusintegrationcaller
AT zengxi survirusarepeatawarevirusintegrationcaller
AT liguoliang survirusarepeatawarevirusintegrationcaller
AT wupeng survirusarepeatawarevirusintegrationcaller
AT sungwingkin survirusarepeatawarevirusintegrationcaller