Cargando…

HaploDMF: viral haplotype reconstruction from long reads via deep matrix factorization

MOTIVATION: Lacking strict proofreading mechanisms, many RNA viruses can generate progeny with slightly changed genomes. Being able to characterize highly similar genomes (i.e. haplotypes) in one virus population helps study the viruses’ evolution and their interactions with the host/other microbes....

Descripción completa

Detalles Bibliográficos
Autores principales: Cai, Dehan, Shang, Jiayu, Sun, Yanni
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9750122/
https://www.ncbi.nlm.nih.gov/pubmed/36308467
http://dx.doi.org/10.1093/bioinformatics/btac708
_version_ 1784850183672037376
author Cai, Dehan
Shang, Jiayu
Sun, Yanni
author_facet Cai, Dehan
Shang, Jiayu
Sun, Yanni
author_sort Cai, Dehan
collection PubMed
description MOTIVATION: Lacking strict proofreading mechanisms, many RNA viruses can generate progeny with slightly changed genomes. Being able to characterize highly similar genomes (i.e. haplotypes) in one virus population helps study the viruses’ evolution and their interactions with the host/other microbes. High-throughput sequencing data has become the major source for characterizing viral populations. However, the inherent limitation on read length by next-generation sequencing makes complete haplotype reconstruction difficult. RESULTS: In this work, we present a new tool named HaploDMF that can construct complete haplotypes using third-generation sequencing (TGS) data. HaploDMF utilizes a deep matrix factorization model with an adapted loss function to learn latent features from aligned reads automatically. The latent features are then used to cluster reads of the same haplotype. Unlike existing tools whose performance can be affected by the overlap size between reads, HaploDMF is able to achieve highly robust performance on data with different coverage, haplotype number and error rates. In particular, it can generate more complete haplotypes even when the sequencing coverage drops in the middle. We benchmark HaploDMF against the state-of-the-art tools on simulated and real sequencing TGS data on different viruses. The results show that HaploDMF competes favorably against all others. AVAILABILITY AND IMPLEMENTATION: The source code and the documentation of HaploDMF are available at https://github.com/dhcai21/HaploDMF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9750122
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-97501222022-12-15 HaploDMF: viral haplotype reconstruction from long reads via deep matrix factorization Cai, Dehan Shang, Jiayu Sun, Yanni Bioinformatics Original Paper MOTIVATION: Lacking strict proofreading mechanisms, many RNA viruses can generate progeny with slightly changed genomes. Being able to characterize highly similar genomes (i.e. haplotypes) in one virus population helps study the viruses’ evolution and their interactions with the host/other microbes. High-throughput sequencing data has become the major source for characterizing viral populations. However, the inherent limitation on read length by next-generation sequencing makes complete haplotype reconstruction difficult. RESULTS: In this work, we present a new tool named HaploDMF that can construct complete haplotypes using third-generation sequencing (TGS) data. HaploDMF utilizes a deep matrix factorization model with an adapted loss function to learn latent features from aligned reads automatically. The latent features are then used to cluster reads of the same haplotype. Unlike existing tools whose performance can be affected by the overlap size between reads, HaploDMF is able to achieve highly robust performance on data with different coverage, haplotype number and error rates. In particular, it can generate more complete haplotypes even when the sequencing coverage drops in the middle. We benchmark HaploDMF against the state-of-the-art tools on simulated and real sequencing TGS data on different viruses. The results show that HaploDMF competes favorably against all others. AVAILABILITY AND IMPLEMENTATION: The source code and the documentation of HaploDMF are available at https://github.com/dhcai21/HaploDMF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-10-29 /pmc/articles/PMC9750122/ /pubmed/36308467 http://dx.doi.org/10.1093/bioinformatics/btac708 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Cai, Dehan
Shang, Jiayu
Sun, Yanni
HaploDMF: viral haplotype reconstruction from long reads via deep matrix factorization
title HaploDMF: viral haplotype reconstruction from long reads via deep matrix factorization
title_full HaploDMF: viral haplotype reconstruction from long reads via deep matrix factorization
title_fullStr HaploDMF: viral haplotype reconstruction from long reads via deep matrix factorization
title_full_unstemmed HaploDMF: viral haplotype reconstruction from long reads via deep matrix factorization
title_short HaploDMF: viral haplotype reconstruction from long reads via deep matrix factorization
title_sort haplodmf: viral haplotype reconstruction from long reads via deep matrix factorization
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9750122/
https://www.ncbi.nlm.nih.gov/pubmed/36308467
http://dx.doi.org/10.1093/bioinformatics/btac708
work_keys_str_mv AT caidehan haplodmfviralhaplotypereconstructionfromlongreadsviadeepmatrixfactorization
AT shangjiayu haplodmfviralhaplotypereconstructionfromlongreadsviadeepmatrixfactorization
AT sunyanni haplodmfviralhaplotypereconstructionfromlongreadsviadeepmatrixfactorization