Cargando…
Viral quasispecies reconstruction via tensor factorization with successive read removal
MOTIVATION: As RNA viruses mutate and adapt to environmental changes, often developing resistance to anti-viral vaccines and drugs, they form an ensemble of viral strains––a viral quasispecies. While high-throughput sequencing (HTS) has enabled in-depth studies of viral quasispecies, sequencing erro...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022648/ https://www.ncbi.nlm.nih.gov/pubmed/29949976 http://dx.doi.org/10.1093/bioinformatics/bty291 |
_version_ | 1783335723128061952 |
---|---|
author | Ahn, Soyeon Ke, Ziqi Vikalo, Haris |
author_facet | Ahn, Soyeon Ke, Ziqi Vikalo, Haris |
author_sort | Ahn, Soyeon |
collection | PubMed |
description | MOTIVATION: As RNA viruses mutate and adapt to environmental changes, often developing resistance to anti-viral vaccines and drugs, they form an ensemble of viral strains––a viral quasispecies. While high-throughput sequencing (HTS) has enabled in-depth studies of viral quasispecies, sequencing errors and limited read lengths render the problem of reconstructing the strains and estimating their spectrum challenging. Inference of viral quasispecies is difficult due to generally non-uniform frequencies of the strains, and is further exacerbated when the genetic distances between the strains are small. RESULTS: This paper presents TenSQR, an algorithm that utilizes tensor factorization framework to analyze HTS data and reconstruct viral quasispecies characterized by highly uneven frequencies of its components. Fundamentally, TenSQR performs clustering with successive data removal to infer strains in a quasispecies in order from the most to the least abundant one; every time a strain is inferred, sequencing reads generated from that strain are removed from the dataset. The proposed successive strain reconstruction and data removal enables discovery of rare strains in a population and facilitates detection of deletions in such strains. Results on simulated datasets demonstrate that TenSQR can reconstruct full-length strains having widely different abundances, generally outperforming state-of-the-art methods at diversities 1–10% and detecting long deletions even in rare strains. A study on a real HIV-1 dataset demonstrates that TenSQR outperforms competing methods in experimental settings as well. Finally, we apply TenSQR to analyze a Zika virus sample and reconstruct the full-length strains it contains. AVAILABILITY AND IMPLEMENTATION: TenSQR is available at https://github.com/SoYeonA/TenSQR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-6022648 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-60226482018-07-10 Viral quasispecies reconstruction via tensor factorization with successive read removal Ahn, Soyeon Ke, Ziqi Vikalo, Haris Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: As RNA viruses mutate and adapt to environmental changes, often developing resistance to anti-viral vaccines and drugs, they form an ensemble of viral strains––a viral quasispecies. While high-throughput sequencing (HTS) has enabled in-depth studies of viral quasispecies, sequencing errors and limited read lengths render the problem of reconstructing the strains and estimating their spectrum challenging. Inference of viral quasispecies is difficult due to generally non-uniform frequencies of the strains, and is further exacerbated when the genetic distances between the strains are small. RESULTS: This paper presents TenSQR, an algorithm that utilizes tensor factorization framework to analyze HTS data and reconstruct viral quasispecies characterized by highly uneven frequencies of its components. Fundamentally, TenSQR performs clustering with successive data removal to infer strains in a quasispecies in order from the most to the least abundant one; every time a strain is inferred, sequencing reads generated from that strain are removed from the dataset. The proposed successive strain reconstruction and data removal enables discovery of rare strains in a population and facilitates detection of deletions in such strains. Results on simulated datasets demonstrate that TenSQR can reconstruct full-length strains having widely different abundances, generally outperforming state-of-the-art methods at diversities 1–10% and detecting long deletions even in rare strains. A study on a real HIV-1 dataset demonstrates that TenSQR outperforms competing methods in experimental settings as well. Finally, we apply TenSQR to analyze a Zika virus sample and reconstruct the full-length strains it contains. AVAILABILITY AND IMPLEMENTATION: TenSQR is available at https://github.com/SoYeonA/TenSQR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6022648/ /pubmed/29949976 http://dx.doi.org/10.1093/bioinformatics/bty291 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Ismb 2018–Intelligent Systems for Molecular Biology Proceedings Ahn, Soyeon Ke, Ziqi Vikalo, Haris Viral quasispecies reconstruction via tensor factorization with successive read removal |
title | Viral quasispecies reconstruction via tensor factorization with successive read removal |
title_full | Viral quasispecies reconstruction via tensor factorization with successive read removal |
title_fullStr | Viral quasispecies reconstruction via tensor factorization with successive read removal |
title_full_unstemmed | Viral quasispecies reconstruction via tensor factorization with successive read removal |
title_short | Viral quasispecies reconstruction via tensor factorization with successive read removal |
title_sort | viral quasispecies reconstruction via tensor factorization with successive read removal |
topic | Ismb 2018–Intelligent Systems for Molecular Biology Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022648/ https://www.ncbi.nlm.nih.gov/pubmed/29949976 http://dx.doi.org/10.1093/bioinformatics/bty291 |
work_keys_str_mv | AT ahnsoyeon viralquasispeciesreconstructionviatensorfactorizationwithsuccessivereadremoval AT keziqi viralquasispeciesreconstructionviatensorfactorizationwithsuccessivereadremoval AT vikaloharis viralquasispeciesreconstructionviatensorfactorizationwithsuccessivereadremoval |