Cargando…
SurVirus: a repeat-aware virus integration caller
A significant portion of human cancers are due to viruses integrating into human genomes. Therefore, accurately predicting virus integrations can help uncover the mechanisms that lead to many devastating diseases. Virus integrations can be called by analysing second generation high-throughput sequen...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8034624/ https://www.ncbi.nlm.nih.gov/pubmed/33444454 http://dx.doi.org/10.1093/nar/gkaa1237 |
_version_ | 1783676569270616064 |
---|---|
author | Rajaby, Ramesh Zhou, Yi Meng, Yifan Zeng, Xi Li, Guoliang Wu, Peng Sung, Wing-Kin |
author_facet | Rajaby, Ramesh Zhou, Yi Meng, Yifan Zeng, Xi Li, Guoliang Wu, Peng Sung, Wing-Kin |
author_sort | Rajaby, Ramesh |
collection | PubMed |
description | A significant portion of human cancers are due to viruses integrating into human genomes. Therefore, accurately predicting virus integrations can help uncover the mechanisms that lead to many devastating diseases. Virus integrations can be called by analysing second generation high-throughput sequencing datasets. Unfortunately, existing methods fail to report a significant portion of integrations, while predicting a large number of false positives. We observe that the inaccuracy is caused by incorrect alignment of reads in repetitive regions. False alignments create false positives, while missing alignments create false negatives. This paper proposes SurVirus, an improved virus integration caller that corrects the alignment of reads which are crucial for the discovery of integrations. We use publicly available datasets to show that existing methods predict hundreds of thousands of false positives; SurVirus, on the other hand, is significantly more precise while it also detects many novel integrations previously missed by other tools, most of which are in repetitive regions. We validate a subset of these novel integrations, and find that the majority are correct. Using SurVirus, we find that HPV and HBV integrations are enriched in LINE and Satellite regions which had been overlooked, as well as discover recurrent HBV and HPV breakpoints in human genome-virus fusion transcripts. |
format | Online Article Text |
id | pubmed-8034624 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-80346242021-04-14 SurVirus: a repeat-aware virus integration caller Rajaby, Ramesh Zhou, Yi Meng, Yifan Zeng, Xi Li, Guoliang Wu, Peng Sung, Wing-Kin Nucleic Acids Res Methods Online A significant portion of human cancers are due to viruses integrating into human genomes. Therefore, accurately predicting virus integrations can help uncover the mechanisms that lead to many devastating diseases. Virus integrations can be called by analysing second generation high-throughput sequencing datasets. Unfortunately, existing methods fail to report a significant portion of integrations, while predicting a large number of false positives. We observe that the inaccuracy is caused by incorrect alignment of reads in repetitive regions. False alignments create false positives, while missing alignments create false negatives. This paper proposes SurVirus, an improved virus integration caller that corrects the alignment of reads which are crucial for the discovery of integrations. We use publicly available datasets to show that existing methods predict hundreds of thousands of false positives; SurVirus, on the other hand, is significantly more precise while it also detects many novel integrations previously missed by other tools, most of which are in repetitive regions. We validate a subset of these novel integrations, and find that the majority are correct. Using SurVirus, we find that HPV and HBV integrations are enriched in LINE and Satellite regions which had been overlooked, as well as discover recurrent HBV and HPV breakpoints in human genome-virus fusion transcripts. Oxford University Press 2021-01-14 /pmc/articles/PMC8034624/ /pubmed/33444454 http://dx.doi.org/10.1093/nar/gkaa1237 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Online Rajaby, Ramesh Zhou, Yi Meng, Yifan Zeng, Xi Li, Guoliang Wu, Peng Sung, Wing-Kin SurVirus: a repeat-aware virus integration caller |
title | SurVirus: a repeat-aware virus integration caller |
title_full | SurVirus: a repeat-aware virus integration caller |
title_fullStr | SurVirus: a repeat-aware virus integration caller |
title_full_unstemmed | SurVirus: a repeat-aware virus integration caller |
title_short | SurVirus: a repeat-aware virus integration caller |
title_sort | survirus: a repeat-aware virus integration caller |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8034624/ https://www.ncbi.nlm.nih.gov/pubmed/33444454 http://dx.doi.org/10.1093/nar/gkaa1237 |
work_keys_str_mv | AT rajabyramesh survirusarepeatawarevirusintegrationcaller AT zhouyi survirusarepeatawarevirusintegrationcaller AT mengyifan survirusarepeatawarevirusintegrationcaller AT zengxi survirusarepeatawarevirusintegrationcaller AT liguoliang survirusarepeatawarevirusintegrationcaller AT wupeng survirusarepeatawarevirusintegrationcaller AT sungwingkin survirusarepeatawarevirusintegrationcaller |