Cargando…

Choice of assembly software has a critical impact on virome characterisation

BACKGROUND: The viral component of microbial communities plays a vital role in driving bacterial diversity, facilitating nutrient turnover and shaping community composition. Despite their importance, the vast majority of viral sequences are poorly annotated and share little or no homology to referen...

Descripción completa

Detalles Bibliográficos
Autores principales: Sutton, Thomas D. S., Clooney, Adam G., Ryan, Feargal J., Ross, R. Paul, Hill, Colin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6350398/
https://www.ncbi.nlm.nih.gov/pubmed/30691529
http://dx.doi.org/10.1186/s40168-019-0626-5
_version_ 1783390447506292736
author Sutton, Thomas D. S.
Clooney, Adam G.
Ryan, Feargal J.
Ross, R. Paul
Hill, Colin
author_facet Sutton, Thomas D. S.
Clooney, Adam G.
Ryan, Feargal J.
Ross, R. Paul
Hill, Colin
author_sort Sutton, Thomas D. S.
collection PubMed
description BACKGROUND: The viral component of microbial communities plays a vital role in driving bacterial diversity, facilitating nutrient turnover and shaping community composition. Despite their importance, the vast majority of viral sequences are poorly annotated and share little or no homology to reference databases. As a result, investigation of the viral metagenome (virome) relies heavily on de novo assembly of short sequencing reads to recover compositional and functional information. Metagenomic assembly is particularly challenging for virome data, often resulting in fragmented assemblies and poor recovery of viral community members. Despite the essential role of assembly in virome analysis and difficulties posed by these data, current assembly comparisons have been limited to subsections of virome studies or bacterial datasets. DESIGN: This study presents the most comprehensive virome assembly comparison to date, featuring 16 metagenomic assembly approaches which have featured in human virome studies. Assemblers were assessed using four independent virome datasets, namely, simulated reads, two mock communities, viromes spiked with a known phage and human gut viromes. RESULTS: Assembly performance varied significantly across all test datasets, with SPAdes (meta) performing consistently well. Performance of MIRA and VICUNA varied, highlighting the importance of using a range of datasets when comparing assembly programs. It was also found that while some assemblers addressed the challenges of virome data better than others, all assemblers had limitations. Low read coverage and genomic repeats resulted in assemblies with poor genome recovery, high degrees of fragmentation and low-accuracy contigs across all assemblers. These limitations must be considered when setting thresholds for downstream analysis and when drawing conclusions from virome data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-019-0626-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6350398
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63503982019-02-04 Choice of assembly software has a critical impact on virome characterisation Sutton, Thomas D. S. Clooney, Adam G. Ryan, Feargal J. Ross, R. Paul Hill, Colin Microbiome Research BACKGROUND: The viral component of microbial communities plays a vital role in driving bacterial diversity, facilitating nutrient turnover and shaping community composition. Despite their importance, the vast majority of viral sequences are poorly annotated and share little or no homology to reference databases. As a result, investigation of the viral metagenome (virome) relies heavily on de novo assembly of short sequencing reads to recover compositional and functional information. Metagenomic assembly is particularly challenging for virome data, often resulting in fragmented assemblies and poor recovery of viral community members. Despite the essential role of assembly in virome analysis and difficulties posed by these data, current assembly comparisons have been limited to subsections of virome studies or bacterial datasets. DESIGN: This study presents the most comprehensive virome assembly comparison to date, featuring 16 metagenomic assembly approaches which have featured in human virome studies. Assemblers were assessed using four independent virome datasets, namely, simulated reads, two mock communities, viromes spiked with a known phage and human gut viromes. RESULTS: Assembly performance varied significantly across all test datasets, with SPAdes (meta) performing consistently well. Performance of MIRA and VICUNA varied, highlighting the importance of using a range of datasets when comparing assembly programs. It was also found that while some assemblers addressed the challenges of virome data better than others, all assemblers had limitations. Low read coverage and genomic repeats resulted in assemblies with poor genome recovery, high degrees of fragmentation and low-accuracy contigs across all assemblers. These limitations must be considered when setting thresholds for downstream analysis and when drawing conclusions from virome data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-019-0626-5) contains supplementary material, which is available to authorized users. BioMed Central 2019-01-28 /pmc/articles/PMC6350398/ /pubmed/30691529 http://dx.doi.org/10.1186/s40168-019-0626-5 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Sutton, Thomas D. S.
Clooney, Adam G.
Ryan, Feargal J.
Ross, R. Paul
Hill, Colin
Choice of assembly software has a critical impact on virome characterisation
title Choice of assembly software has a critical impact on virome characterisation
title_full Choice of assembly software has a critical impact on virome characterisation
title_fullStr Choice of assembly software has a critical impact on virome characterisation
title_full_unstemmed Choice of assembly software has a critical impact on virome characterisation
title_short Choice of assembly software has a critical impact on virome characterisation
title_sort choice of assembly software has a critical impact on virome characterisation
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6350398/
https://www.ncbi.nlm.nih.gov/pubmed/30691529
http://dx.doi.org/10.1186/s40168-019-0626-5
work_keys_str_mv AT suttonthomasds choiceofassemblysoftwarehasacriticalimpactonviromecharacterisation
AT clooneyadamg choiceofassemblysoftwarehasacriticalimpactonviromecharacterisation
AT ryanfeargalj choiceofassemblysoftwarehasacriticalimpactonviromecharacterisation
AT rossrpaul choiceofassemblysoftwarehasacriticalimpactonviromecharacterisation
AT hillcolin choiceofassemblysoftwarehasacriticalimpactonviromecharacterisation