Cargando…

Accurate reconstruction of viral genomes in human cells from short reads using iterative refinement

BACKGROUND: After an infection, human cells may contain viral genomes in the form of episomes or integrated DNA. Comparing the genomic sequences of different strains of a virus in human cells can often provide useful insights into its behaviour, activity and pathology, and may help develop methods f...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Sau-Dan, Wu, Man, Lo, Kwok-Wai, Yip, Kevin Y.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9169298/
https://www.ncbi.nlm.nih.gov/pubmed/35668367
http://dx.doi.org/10.1186/s12864-022-08649-8
_version_ 1784721176575082496
author Lee, Sau-Dan
Wu, Man
Lo, Kwok-Wai
Yip, Kevin Y.
author_facet Lee, Sau-Dan
Wu, Man
Lo, Kwok-Wai
Yip, Kevin Y.
author_sort Lee, Sau-Dan
collection PubMed
description BACKGROUND: After an infection, human cells may contain viral genomes in the form of episomes or integrated DNA. Comparing the genomic sequences of different strains of a virus in human cells can often provide useful insights into its behaviour, activity and pathology, and may help develop methods for disease prevention and treatment. To support such comparative analyses, the viral genomes need to be accurately reconstructed from a large number of samples. Previous efforts either rely on customized experimental protocols or require high similarity between the sequenced genomes and a reference, both of which limit the general applicability of these approaches. In this study, we propose a pipeline, named ASPIRE, for reconstructing viral genomes accurately from short reads data of human samples, which are increasingly available from genome projects and personal genomics. ASPIRE contains a basic part that involves de novo assembly, tiling and gap filling, and additional components for iterative refinement, sequence corrections and wrapping. RESULTS: Evaluated by the alignment quality of sequencing reads to the reconstructed genomes, these additional components improve the assembly quality in general, and in some particular samples quite substantially, especially when the sequenced genome is significantly different from the reference. We use ASPIRE to reconstruct the genomes of Epstein Barr Virus (EBV) from the whole-genome sequencing data of 61 nasopharyngeal carcinoma (NPC) samples and provide these sequences as a resource for EBV research. CONCLUSIONS: ASPIRE improves the quality of the reconstructed EBV genomes in published studies and outperforms TRACESPipe in some samples considered. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12864-022-08649-8).
format Online
Article
Text
id pubmed-9169298
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-91692982022-06-07 Accurate reconstruction of viral genomes in human cells from short reads using iterative refinement Lee, Sau-Dan Wu, Man Lo, Kwok-Wai Yip, Kevin Y. BMC Genomics Software BACKGROUND: After an infection, human cells may contain viral genomes in the form of episomes or integrated DNA. Comparing the genomic sequences of different strains of a virus in human cells can often provide useful insights into its behaviour, activity and pathology, and may help develop methods for disease prevention and treatment. To support such comparative analyses, the viral genomes need to be accurately reconstructed from a large number of samples. Previous efforts either rely on customized experimental protocols or require high similarity between the sequenced genomes and a reference, both of which limit the general applicability of these approaches. In this study, we propose a pipeline, named ASPIRE, for reconstructing viral genomes accurately from short reads data of human samples, which are increasingly available from genome projects and personal genomics. ASPIRE contains a basic part that involves de novo assembly, tiling and gap filling, and additional components for iterative refinement, sequence corrections and wrapping. RESULTS: Evaluated by the alignment quality of sequencing reads to the reconstructed genomes, these additional components improve the assembly quality in general, and in some particular samples quite substantially, especially when the sequenced genome is significantly different from the reference. We use ASPIRE to reconstruct the genomes of Epstein Barr Virus (EBV) from the whole-genome sequencing data of 61 nasopharyngeal carcinoma (NPC) samples and provide these sequences as a resource for EBV research. CONCLUSIONS: ASPIRE improves the quality of the reconstructed EBV genomes in published studies and outperforms TRACESPipe in some samples considered. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12864-022-08649-8). BioMed Central 2022-06-06 /pmc/articles/PMC9169298/ /pubmed/35668367 http://dx.doi.org/10.1186/s12864-022-08649-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Lee, Sau-Dan
Wu, Man
Lo, Kwok-Wai
Yip, Kevin Y.
Accurate reconstruction of viral genomes in human cells from short reads using iterative refinement
title Accurate reconstruction of viral genomes in human cells from short reads using iterative refinement
title_full Accurate reconstruction of viral genomes in human cells from short reads using iterative refinement
title_fullStr Accurate reconstruction of viral genomes in human cells from short reads using iterative refinement
title_full_unstemmed Accurate reconstruction of viral genomes in human cells from short reads using iterative refinement
title_short Accurate reconstruction of viral genomes in human cells from short reads using iterative refinement
title_sort accurate reconstruction of viral genomes in human cells from short reads using iterative refinement
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9169298/
https://www.ncbi.nlm.nih.gov/pubmed/35668367
http://dx.doi.org/10.1186/s12864-022-08649-8
work_keys_str_mv AT leesaudan accuratereconstructionofviralgenomesinhumancellsfromshortreadsusingiterativerefinement
AT wuman accuratereconstructionofviralgenomesinhumancellsfromshortreadsusingiterativerefinement
AT lokwokwai accuratereconstructionofviralgenomesinhumancellsfromshortreadsusingiterativerefinement
AT yipkeviny accuratereconstructionofviralgenomesinhumancellsfromshortreadsusingiterativerefinement