Cargando…

VGEA: an RNA viral assembly toolkit

Next generation sequencing (NGS)-based studies have vastly increased our understanding of viral diversity. Viral sequence data obtained from NGS experiments are a rich source of information, these data can be used to study their epidemiology, evolution, transmission patterns, and can also inform dru...

Descripción completa

Detalles Bibliográficos
Autores principales: Oluniyi, Paul E., Ajogbasile, Fehintola, Oguzie, Judith, Uwanibe, Jessica, Kayode, Adeyemi, Happi, Anise, Ugwu, Alphonsus, Olumade, Testimony, Ogunsanya, Olusola, Eromon, Philomena Ehiaghe, Folarin, Onikepe, Frost, Simon D.W., Heeney, Jonathan, Happi, Christian T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8428259/
https://www.ncbi.nlm.nih.gov/pubmed/34567846
http://dx.doi.org/10.7717/peerj.12129
_version_ 1783750342501990400
author Oluniyi, Paul E.
Ajogbasile, Fehintola
Oguzie, Judith
Uwanibe, Jessica
Kayode, Adeyemi
Happi, Anise
Ugwu, Alphonsus
Olumade, Testimony
Ogunsanya, Olusola
Eromon, Philomena Ehiaghe
Folarin, Onikepe
Frost, Simon D.W.
Heeney, Jonathan
Happi, Christian T.
author_facet Oluniyi, Paul E.
Ajogbasile, Fehintola
Oguzie, Judith
Uwanibe, Jessica
Kayode, Adeyemi
Happi, Anise
Ugwu, Alphonsus
Olumade, Testimony
Ogunsanya, Olusola
Eromon, Philomena Ehiaghe
Folarin, Onikepe
Frost, Simon D.W.
Heeney, Jonathan
Happi, Christian T.
author_sort Oluniyi, Paul E.
collection PubMed
description Next generation sequencing (NGS)-based studies have vastly increased our understanding of viral diversity. Viral sequence data obtained from NGS experiments are a rich source of information, these data can be used to study their epidemiology, evolution, transmission patterns, and can also inform drug and vaccine design. Viral genomes, however, represent a great challenge to bioinformatics due to their high mutation rate and forming quasispecies in the same infected host, bringing about the need to implement advanced bioinformatics tools to assemble consensus genomes well-representative of the viral population circulating in individual patients. Many tools have been developed to preprocess sequencing reads, carry-out de novo or reference-assisted assembly of viral genomes and assess the quality of the genomes obtained. Most of these tools however exist as standalone workflows and usually require huge computational resources. Here we present (Viral Genomes Easily Analyzed), a Snakemake workflow for analyzing RNA viral genomes. VGEA enables users to map sequencing reads to the human genome to remove human contaminants, split bam files into forward and reverse reads, carry out de novo assembly of forward and reverse reads to generate contigs, pre-process reads for quality and contamination, map reads to a reference tailored to the sample using corrected contigs supplemented by the user’s choice of reference sequences and evaluate/compare genome assemblies. We designed a project with the aim of creating a flexible, easy-to-use and all-in-one pipeline from existing/stand-alone bioinformatics tools for viral genome analysis that can be deployed on a personal computer. VGEA was built on the Snakemake workflow management system and utilizes existing tools for each step: fastp (Chen et al., 2018) for read trimming and read-level quality control, BWA (Li & Durbin, 2009) for mapping sequencing reads to the human reference genome, SAMtools (Li et al., 2009) for extracting unmapped reads and also for splitting bam files into fastq files, IVA (Hunt et al., 2015) for de novo assembly to generate contigs, shiver (Wymant et al., 2018) to pre-process reads for quality and contamination, then map to a reference tailored to the sample using corrected contigs supplemented with the user’s choice of existing reference sequences, SeqKit (Shen et al., 2016) for cleaning shiver assembly for QUAST, QUAST (Gurevich et al., 2013) to evaluate/assess the quality of genome assemblies and MultiQC (Ewels et al., 2016) for aggregation of the results from fastp, BWA and QUAST. Our pipeline was successfully tested and validated with SARS-CoV-2 (n = 20), HIV-1 (n = 20) and Lassa Virus (n = 20) datasets all of which have been made publicly available. VGEA is freely available on GitHub at: https://github.com/pauloluniyi/VGEA under the GNU General Public License.
format Online
Article
Text
id pubmed-8428259
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-84282592021-09-24 VGEA: an RNA viral assembly toolkit Oluniyi, Paul E. Ajogbasile, Fehintola Oguzie, Judith Uwanibe, Jessica Kayode, Adeyemi Happi, Anise Ugwu, Alphonsus Olumade, Testimony Ogunsanya, Olusola Eromon, Philomena Ehiaghe Folarin, Onikepe Frost, Simon D.W. Heeney, Jonathan Happi, Christian T. PeerJ Bioinformatics Next generation sequencing (NGS)-based studies have vastly increased our understanding of viral diversity. Viral sequence data obtained from NGS experiments are a rich source of information, these data can be used to study their epidemiology, evolution, transmission patterns, and can also inform drug and vaccine design. Viral genomes, however, represent a great challenge to bioinformatics due to their high mutation rate and forming quasispecies in the same infected host, bringing about the need to implement advanced bioinformatics tools to assemble consensus genomes well-representative of the viral population circulating in individual patients. Many tools have been developed to preprocess sequencing reads, carry-out de novo or reference-assisted assembly of viral genomes and assess the quality of the genomes obtained. Most of these tools however exist as standalone workflows and usually require huge computational resources. Here we present (Viral Genomes Easily Analyzed), a Snakemake workflow for analyzing RNA viral genomes. VGEA enables users to map sequencing reads to the human genome to remove human contaminants, split bam files into forward and reverse reads, carry out de novo assembly of forward and reverse reads to generate contigs, pre-process reads for quality and contamination, map reads to a reference tailored to the sample using corrected contigs supplemented by the user’s choice of reference sequences and evaluate/compare genome assemblies. We designed a project with the aim of creating a flexible, easy-to-use and all-in-one pipeline from existing/stand-alone bioinformatics tools for viral genome analysis that can be deployed on a personal computer. VGEA was built on the Snakemake workflow management system and utilizes existing tools for each step: fastp (Chen et al., 2018) for read trimming and read-level quality control, BWA (Li & Durbin, 2009) for mapping sequencing reads to the human reference genome, SAMtools (Li et al., 2009) for extracting unmapped reads and also for splitting bam files into fastq files, IVA (Hunt et al., 2015) for de novo assembly to generate contigs, shiver (Wymant et al., 2018) to pre-process reads for quality and contamination, then map to a reference tailored to the sample using corrected contigs supplemented with the user’s choice of existing reference sequences, SeqKit (Shen et al., 2016) for cleaning shiver assembly for QUAST, QUAST (Gurevich et al., 2013) to evaluate/assess the quality of genome assemblies and MultiQC (Ewels et al., 2016) for aggregation of the results from fastp, BWA and QUAST. Our pipeline was successfully tested and validated with SARS-CoV-2 (n = 20), HIV-1 (n = 20) and Lassa Virus (n = 20) datasets all of which have been made publicly available. VGEA is freely available on GitHub at: https://github.com/pauloluniyi/VGEA under the GNU General Public License. PeerJ Inc. 2021-09-06 /pmc/articles/PMC8428259/ /pubmed/34567846 http://dx.doi.org/10.7717/peerj.12129 Text en ©2021 Oluniyi et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Oluniyi, Paul E.
Ajogbasile, Fehintola
Oguzie, Judith
Uwanibe, Jessica
Kayode, Adeyemi
Happi, Anise
Ugwu, Alphonsus
Olumade, Testimony
Ogunsanya, Olusola
Eromon, Philomena Ehiaghe
Folarin, Onikepe
Frost, Simon D.W.
Heeney, Jonathan
Happi, Christian T.
VGEA: an RNA viral assembly toolkit
title VGEA: an RNA viral assembly toolkit
title_full VGEA: an RNA viral assembly toolkit
title_fullStr VGEA: an RNA viral assembly toolkit
title_full_unstemmed VGEA: an RNA viral assembly toolkit
title_short VGEA: an RNA viral assembly toolkit
title_sort vgea: an rna viral assembly toolkit
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8428259/
https://www.ncbi.nlm.nih.gov/pubmed/34567846
http://dx.doi.org/10.7717/peerj.12129
work_keys_str_mv AT oluniyipaule vgeaanrnaviralassemblytoolkit
AT ajogbasilefehintola vgeaanrnaviralassemblytoolkit
AT oguziejudith vgeaanrnaviralassemblytoolkit
AT uwanibejessica vgeaanrnaviralassemblytoolkit
AT kayodeadeyemi vgeaanrnaviralassemblytoolkit
AT happianise vgeaanrnaviralassemblytoolkit
AT ugwualphonsus vgeaanrnaviralassemblytoolkit
AT olumadetestimony vgeaanrnaviralassemblytoolkit
AT ogunsanyaolusola vgeaanrnaviralassemblytoolkit
AT eromonphilomenaehiaghe vgeaanrnaviralassemblytoolkit
AT folarinonikepe vgeaanrnaviralassemblytoolkit
AT frostsimondw vgeaanrnaviralassemblytoolkit
AT heeneyjonathan vgeaanrnaviralassemblytoolkit
AT happichristiant vgeaanrnaviralassemblytoolkit