Cargando…

Evaluation of Alignment Algorithms for Discovery and Identification of Pathogens Using RNA-Seq

Next-generation sequencing technologies provide an unparallelled opportunity for the characterization and discovery of known and novel viruses. Because viruses are known to have the highest mutation rates when compared to eukaryotic and bacterial organisms, we assess the extent to which eleven well-...

Descripción completa

Detalles Bibliográficos
Autores principales: Borozan, Ivan, Watt, Stuart N., Ferretti, Vincent
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3813700/
https://www.ncbi.nlm.nih.gov/pubmed/24204709
http://dx.doi.org/10.1371/journal.pone.0076935
_version_ 1782289144357060608
author Borozan, Ivan
Watt, Stuart N.
Ferretti, Vincent
author_facet Borozan, Ivan
Watt, Stuart N.
Ferretti, Vincent
author_sort Borozan, Ivan
collection PubMed
description Next-generation sequencing technologies provide an unparallelled opportunity for the characterization and discovery of known and novel viruses. Because viruses are known to have the highest mutation rates when compared to eukaryotic and bacterial organisms, we assess the extent to which eleven well-known alignment algorithms (BLAST, BLAT, BWA, BWA-SW, BWA-MEM, BFAST, Bowtie2, Novoalign, GSNAP, SHRiMP2 and STAR) can be used for characterizing mutated and non-mutated viral sequences - including those that exhibit RNA splicing - in transcriptome samples. To evaluate aligners objectively we developed a realistic RNA-Seq simulation and evaluation framework (RiSER) and propose a new combined score to rank aligners for viral characterization in terms of their precision, sensitivity and alignment accuracy. We used RiSER to simulate both human and viral read sequences and suggest the best set of aligners for viral sequence characterization in human transcriptome samples. Our results show that significant and substantial differences exist between aligners and that a digital-subtraction-based viral identification framework can and should use different aligners for different parts of the process. We determine the extent to which mutated viral sequences can be effectively characterized and show that more sensitive aligners such as BLAST, BFAST, SHRiMP2, BWA-SW and GSNAP can accurately characterize substantially divergent viral sequences with up to 15% overall sequence mutation rate. We believe that the results presented here will be useful to researchers choosing aligners for viral sequence characterization using next-generation sequencing data.
format Online
Article
Text
id pubmed-3813700
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-38137002013-11-07 Evaluation of Alignment Algorithms for Discovery and Identification of Pathogens Using RNA-Seq Borozan, Ivan Watt, Stuart N. Ferretti, Vincent PLoS One Research Article Next-generation sequencing technologies provide an unparallelled opportunity for the characterization and discovery of known and novel viruses. Because viruses are known to have the highest mutation rates when compared to eukaryotic and bacterial organisms, we assess the extent to which eleven well-known alignment algorithms (BLAST, BLAT, BWA, BWA-SW, BWA-MEM, BFAST, Bowtie2, Novoalign, GSNAP, SHRiMP2 and STAR) can be used for characterizing mutated and non-mutated viral sequences - including those that exhibit RNA splicing - in transcriptome samples. To evaluate aligners objectively we developed a realistic RNA-Seq simulation and evaluation framework (RiSER) and propose a new combined score to rank aligners for viral characterization in terms of their precision, sensitivity and alignment accuracy. We used RiSER to simulate both human and viral read sequences and suggest the best set of aligners for viral sequence characterization in human transcriptome samples. Our results show that significant and substantial differences exist between aligners and that a digital-subtraction-based viral identification framework can and should use different aligners for different parts of the process. We determine the extent to which mutated viral sequences can be effectively characterized and show that more sensitive aligners such as BLAST, BFAST, SHRiMP2, BWA-SW and GSNAP can accurately characterize substantially divergent viral sequences with up to 15% overall sequence mutation rate. We believe that the results presented here will be useful to researchers choosing aligners for viral sequence characterization using next-generation sequencing data. Public Library of Science 2013-10-30 /pmc/articles/PMC3813700/ /pubmed/24204709 http://dx.doi.org/10.1371/journal.pone.0076935 Text en © 2013 Borozan et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Borozan, Ivan
Watt, Stuart N.
Ferretti, Vincent
Evaluation of Alignment Algorithms for Discovery and Identification of Pathogens Using RNA-Seq
title Evaluation of Alignment Algorithms for Discovery and Identification of Pathogens Using RNA-Seq
title_full Evaluation of Alignment Algorithms for Discovery and Identification of Pathogens Using RNA-Seq
title_fullStr Evaluation of Alignment Algorithms for Discovery and Identification of Pathogens Using RNA-Seq
title_full_unstemmed Evaluation of Alignment Algorithms for Discovery and Identification of Pathogens Using RNA-Seq
title_short Evaluation of Alignment Algorithms for Discovery and Identification of Pathogens Using RNA-Seq
title_sort evaluation of alignment algorithms for discovery and identification of pathogens using rna-seq
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3813700/
https://www.ncbi.nlm.nih.gov/pubmed/24204709
http://dx.doi.org/10.1371/journal.pone.0076935
work_keys_str_mv AT borozanivan evaluationofalignmentalgorithmsfordiscoveryandidentificationofpathogensusingrnaseq
AT wattstuartn evaluationofalignmentalgorithmsfordiscoveryandidentificationofpathogensusingrnaseq
AT ferrettivincent evaluationofalignmentalgorithmsfordiscoveryandidentificationofpathogensusingrnaseq