Cargando…
Accurate reconstruction of viral genomes in human cells from short reads using iterative refinement
BACKGROUND: After an infection, human cells may contain viral genomes in the form of episomes or integrated DNA. Comparing the genomic sequences of different strains of a virus in human cells can often provide useful insights into its behaviour, activity and pathology, and may help develop methods f...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9169298/ https://www.ncbi.nlm.nih.gov/pubmed/35668367 http://dx.doi.org/10.1186/s12864-022-08649-8 |
_version_ | 1784721176575082496 |
---|---|
author | Lee, Sau-Dan Wu, Man Lo, Kwok-Wai Yip, Kevin Y. |
author_facet | Lee, Sau-Dan Wu, Man Lo, Kwok-Wai Yip, Kevin Y. |
author_sort | Lee, Sau-Dan |
collection | PubMed |
description | BACKGROUND: After an infection, human cells may contain viral genomes in the form of episomes or integrated DNA. Comparing the genomic sequences of different strains of a virus in human cells can often provide useful insights into its behaviour, activity and pathology, and may help develop methods for disease prevention and treatment. To support such comparative analyses, the viral genomes need to be accurately reconstructed from a large number of samples. Previous efforts either rely on customized experimental protocols or require high similarity between the sequenced genomes and a reference, both of which limit the general applicability of these approaches. In this study, we propose a pipeline, named ASPIRE, for reconstructing viral genomes accurately from short reads data of human samples, which are increasingly available from genome projects and personal genomics. ASPIRE contains a basic part that involves de novo assembly, tiling and gap filling, and additional components for iterative refinement, sequence corrections and wrapping. RESULTS: Evaluated by the alignment quality of sequencing reads to the reconstructed genomes, these additional components improve the assembly quality in general, and in some particular samples quite substantially, especially when the sequenced genome is significantly different from the reference. We use ASPIRE to reconstruct the genomes of Epstein Barr Virus (EBV) from the whole-genome sequencing data of 61 nasopharyngeal carcinoma (NPC) samples and provide these sequences as a resource for EBV research. CONCLUSIONS: ASPIRE improves the quality of the reconstructed EBV genomes in published studies and outperforms TRACESPipe in some samples considered. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12864-022-08649-8). |
format | Online Article Text |
id | pubmed-9169298 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-91692982022-06-07 Accurate reconstruction of viral genomes in human cells from short reads using iterative refinement Lee, Sau-Dan Wu, Man Lo, Kwok-Wai Yip, Kevin Y. BMC Genomics Software BACKGROUND: After an infection, human cells may contain viral genomes in the form of episomes or integrated DNA. Comparing the genomic sequences of different strains of a virus in human cells can often provide useful insights into its behaviour, activity and pathology, and may help develop methods for disease prevention and treatment. To support such comparative analyses, the viral genomes need to be accurately reconstructed from a large number of samples. Previous efforts either rely on customized experimental protocols or require high similarity between the sequenced genomes and a reference, both of which limit the general applicability of these approaches. In this study, we propose a pipeline, named ASPIRE, for reconstructing viral genomes accurately from short reads data of human samples, which are increasingly available from genome projects and personal genomics. ASPIRE contains a basic part that involves de novo assembly, tiling and gap filling, and additional components for iterative refinement, sequence corrections and wrapping. RESULTS: Evaluated by the alignment quality of sequencing reads to the reconstructed genomes, these additional components improve the assembly quality in general, and in some particular samples quite substantially, especially when the sequenced genome is significantly different from the reference. We use ASPIRE to reconstruct the genomes of Epstein Barr Virus (EBV) from the whole-genome sequencing data of 61 nasopharyngeal carcinoma (NPC) samples and provide these sequences as a resource for EBV research. CONCLUSIONS: ASPIRE improves the quality of the reconstructed EBV genomes in published studies and outperforms TRACESPipe in some samples considered. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12864-022-08649-8). BioMed Central 2022-06-06 /pmc/articles/PMC9169298/ /pubmed/35668367 http://dx.doi.org/10.1186/s12864-022-08649-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Lee, Sau-Dan Wu, Man Lo, Kwok-Wai Yip, Kevin Y. Accurate reconstruction of viral genomes in human cells from short reads using iterative refinement |
title | Accurate reconstruction of viral genomes in human cells from short reads using iterative refinement |
title_full | Accurate reconstruction of viral genomes in human cells from short reads using iterative refinement |
title_fullStr | Accurate reconstruction of viral genomes in human cells from short reads using iterative refinement |
title_full_unstemmed | Accurate reconstruction of viral genomes in human cells from short reads using iterative refinement |
title_short | Accurate reconstruction of viral genomes in human cells from short reads using iterative refinement |
title_sort | accurate reconstruction of viral genomes in human cells from short reads using iterative refinement |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9169298/ https://www.ncbi.nlm.nih.gov/pubmed/35668367 http://dx.doi.org/10.1186/s12864-022-08649-8 |
work_keys_str_mv | AT leesaudan accuratereconstructionofviralgenomesinhumancellsfromshortreadsusingiterativerefinement AT wuman accuratereconstructionofviralgenomesinhumancellsfromshortreadsusingiterativerefinement AT lokwokwai accuratereconstructionofviralgenomesinhumancellsfromshortreadsusingiterativerefinement AT yipkeviny accuratereconstructionofviralgenomesinhumancellsfromshortreadsusingiterativerefinement |