Cargando…

Viral quasispecies reconstruction via tensor factorization with successive read removal

MOTIVATION: As RNA viruses mutate and adapt to environmental changes, often developing resistance to anti-viral vaccines and drugs, they form an ensemble of viral strains––a viral quasispecies. While high-throughput sequencing (HTS) has enabled in-depth studies of viral quasispecies, sequencing erro...

Descripción completa

Detalles Bibliográficos
Autores principales: Ahn, Soyeon, Ke, Ziqi, Vikalo, Haris
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022648/
https://www.ncbi.nlm.nih.gov/pubmed/29949976
http://dx.doi.org/10.1093/bioinformatics/bty291
_version_ 1783335723128061952
author Ahn, Soyeon
Ke, Ziqi
Vikalo, Haris
author_facet Ahn, Soyeon
Ke, Ziqi
Vikalo, Haris
author_sort Ahn, Soyeon
collection PubMed
description MOTIVATION: As RNA viruses mutate and adapt to environmental changes, often developing resistance to anti-viral vaccines and drugs, they form an ensemble of viral strains––a viral quasispecies. While high-throughput sequencing (HTS) has enabled in-depth studies of viral quasispecies, sequencing errors and limited read lengths render the problem of reconstructing the strains and estimating their spectrum challenging. Inference of viral quasispecies is difficult due to generally non-uniform frequencies of the strains, and is further exacerbated when the genetic distances between the strains are small. RESULTS: This paper presents TenSQR, an algorithm that utilizes tensor factorization framework to analyze HTS data and reconstruct viral quasispecies characterized by highly uneven frequencies of its components. Fundamentally, TenSQR performs clustering with successive data removal to infer strains in a quasispecies in order from the most to the least abundant one; every time a strain is inferred, sequencing reads generated from that strain are removed from the dataset. The proposed successive strain reconstruction and data removal enables discovery of rare strains in a population and facilitates detection of deletions in such strains. Results on simulated datasets demonstrate that TenSQR can reconstruct full-length strains having widely different abundances, generally outperforming state-of-the-art methods at diversities 1–10% and detecting long deletions even in rare strains. A study on a real HIV-1 dataset demonstrates that TenSQR outperforms competing methods in experimental settings as well. Finally, we apply TenSQR to analyze a Zika virus sample and reconstruct the full-length strains it contains. AVAILABILITY AND IMPLEMENTATION: TenSQR is available at https://github.com/SoYeonA/TenSQR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6022648
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60226482018-07-10 Viral quasispecies reconstruction via tensor factorization with successive read removal Ahn, Soyeon Ke, Ziqi Vikalo, Haris Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: As RNA viruses mutate and adapt to environmental changes, often developing resistance to anti-viral vaccines and drugs, they form an ensemble of viral strains––a viral quasispecies. While high-throughput sequencing (HTS) has enabled in-depth studies of viral quasispecies, sequencing errors and limited read lengths render the problem of reconstructing the strains and estimating their spectrum challenging. Inference of viral quasispecies is difficult due to generally non-uniform frequencies of the strains, and is further exacerbated when the genetic distances between the strains are small. RESULTS: This paper presents TenSQR, an algorithm that utilizes tensor factorization framework to analyze HTS data and reconstruct viral quasispecies characterized by highly uneven frequencies of its components. Fundamentally, TenSQR performs clustering with successive data removal to infer strains in a quasispecies in order from the most to the least abundant one; every time a strain is inferred, sequencing reads generated from that strain are removed from the dataset. The proposed successive strain reconstruction and data removal enables discovery of rare strains in a population and facilitates detection of deletions in such strains. Results on simulated datasets demonstrate that TenSQR can reconstruct full-length strains having widely different abundances, generally outperforming state-of-the-art methods at diversities 1–10% and detecting long deletions even in rare strains. A study on a real HIV-1 dataset demonstrates that TenSQR outperforms competing methods in experimental settings as well. Finally, we apply TenSQR to analyze a Zika virus sample and reconstruct the full-length strains it contains. AVAILABILITY AND IMPLEMENTATION: TenSQR is available at https://github.com/SoYeonA/TenSQR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6022648/ /pubmed/29949976 http://dx.doi.org/10.1093/bioinformatics/bty291 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
Ahn, Soyeon
Ke, Ziqi
Vikalo, Haris
Viral quasispecies reconstruction via tensor factorization with successive read removal
title Viral quasispecies reconstruction via tensor factorization with successive read removal
title_full Viral quasispecies reconstruction via tensor factorization with successive read removal
title_fullStr Viral quasispecies reconstruction via tensor factorization with successive read removal
title_full_unstemmed Viral quasispecies reconstruction via tensor factorization with successive read removal
title_short Viral quasispecies reconstruction via tensor factorization with successive read removal
title_sort viral quasispecies reconstruction via tensor factorization with successive read removal
topic Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022648/
https://www.ncbi.nlm.nih.gov/pubmed/29949976
http://dx.doi.org/10.1093/bioinformatics/bty291
work_keys_str_mv AT ahnsoyeon viralquasispeciesreconstructionviatensorfactorizationwithsuccessivereadremoval
AT keziqi viralquasispeciesreconstructionviatensorfactorizationwithsuccessivereadremoval
AT vikaloharis viralquasispeciesreconstructionviatensorfactorizationwithsuccessivereadremoval