Cargando…

Host Subtraction, Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data

The use of next generation sequencing (NGS) to identify novel viral sequences from eukaryotic tissue samples is challenging. Issues can include the low proportion and copy number of viral reads and the high number of contigs (post-assembly), making subsequent viral analysis difficult. Comparison of...

Descripción completa

Detalles Bibliográficos
Autores principales: Daly, Gordon M., Leggett, Richard M., Rowe, William, Stubbs, Samuel, Wilkinson, Maxim, Ramirez-Gonzalez, Ricardo H., Caccamo, Mario, Bernal, William, Heeney, Jonathan L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4476701/
https://www.ncbi.nlm.nih.gov/pubmed/26098299
http://dx.doi.org/10.1371/journal.pone.0129059
_version_ 1782377636867080192
author Daly, Gordon M.
Leggett, Richard M.
Rowe, William
Stubbs, Samuel
Wilkinson, Maxim
Ramirez-Gonzalez, Ricardo H.
Caccamo, Mario
Bernal, William
Heeney, Jonathan L.
author_facet Daly, Gordon M.
Leggett, Richard M.
Rowe, William
Stubbs, Samuel
Wilkinson, Maxim
Ramirez-Gonzalez, Ricardo H.
Caccamo, Mario
Bernal, William
Heeney, Jonathan L.
author_sort Daly, Gordon M.
collection PubMed
description The use of next generation sequencing (NGS) to identify novel viral sequences from eukaryotic tissue samples is challenging. Issues can include the low proportion and copy number of viral reads and the high number of contigs (post-assembly), making subsequent viral analysis difficult. Comparison of assembly algorithms with pre-assembly host-mapping subtraction using a short-read mapping tool, a k-mer frequency based filter and a low complexity filter, has been validated for viral discovery with Illumina data derived from naturally infected liver tissue and simulated data. Assembled contig numbers were significantly reduced (up to 99.97%) by the application of these pre-assembly filtering methods. This approach provides a validated method for maximizing viral contig size as well as reducing the total number of assembled contigs that require down-stream analysis as putative viral nucleic acids.
format Online
Article
Text
id pubmed-4476701
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44767012015-06-25 Host Subtraction, Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data Daly, Gordon M. Leggett, Richard M. Rowe, William Stubbs, Samuel Wilkinson, Maxim Ramirez-Gonzalez, Ricardo H. Caccamo, Mario Bernal, William Heeney, Jonathan L. PLoS One Research Article The use of next generation sequencing (NGS) to identify novel viral sequences from eukaryotic tissue samples is challenging. Issues can include the low proportion and copy number of viral reads and the high number of contigs (post-assembly), making subsequent viral analysis difficult. Comparison of assembly algorithms with pre-assembly host-mapping subtraction using a short-read mapping tool, a k-mer frequency based filter and a low complexity filter, has been validated for viral discovery with Illumina data derived from naturally infected liver tissue and simulated data. Assembled contig numbers were significantly reduced (up to 99.97%) by the application of these pre-assembly filtering methods. This approach provides a validated method for maximizing viral contig size as well as reducing the total number of assembled contigs that require down-stream analysis as putative viral nucleic acids. Public Library of Science 2015-06-22 /pmc/articles/PMC4476701/ /pubmed/26098299 http://dx.doi.org/10.1371/journal.pone.0129059 Text en © 2015 Daly et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Daly, Gordon M.
Leggett, Richard M.
Rowe, William
Stubbs, Samuel
Wilkinson, Maxim
Ramirez-Gonzalez, Ricardo H.
Caccamo, Mario
Bernal, William
Heeney, Jonathan L.
Host Subtraction, Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data
title Host Subtraction, Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data
title_full Host Subtraction, Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data
title_fullStr Host Subtraction, Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data
title_full_unstemmed Host Subtraction, Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data
title_short Host Subtraction, Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data
title_sort host subtraction, filtering and assembly validations for novel viral discovery using next generation sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4476701/
https://www.ncbi.nlm.nih.gov/pubmed/26098299
http://dx.doi.org/10.1371/journal.pone.0129059
work_keys_str_mv AT dalygordonm hostsubtractionfilteringandassemblyvalidationsfornovelviraldiscoveryusingnextgenerationsequencingdata
AT leggettrichardm hostsubtractionfilteringandassemblyvalidationsfornovelviraldiscoveryusingnextgenerationsequencingdata
AT rowewilliam hostsubtractionfilteringandassemblyvalidationsfornovelviraldiscoveryusingnextgenerationsequencingdata
AT stubbssamuel hostsubtractionfilteringandassemblyvalidationsfornovelviraldiscoveryusingnextgenerationsequencingdata
AT wilkinsonmaxim hostsubtractionfilteringandassemblyvalidationsfornovelviraldiscoveryusingnextgenerationsequencingdata
AT ramirezgonzalezricardoh hostsubtractionfilteringandassemblyvalidationsfornovelviraldiscoveryusingnextgenerationsequencingdata
AT caccamomario hostsubtractionfilteringandassemblyvalidationsfornovelviraldiscoveryusingnextgenerationsequencingdata
AT bernalwilliam hostsubtractionfilteringandassemblyvalidationsfornovelviraldiscoveryusingnextgenerationsequencingdata
AT heeneyjonathanl hostsubtractionfilteringandassemblyvalidationsfornovelviraldiscoveryusingnextgenerationsequencingdata