Cargando…

Magic-BLAST, an accurate RNA-seq aligner for long and short reads

BACKGROUND: Next-generation sequencing technologies can produce tens of millions of reads, often paired-end, from transcripts or genomes. But few programs can align RNA on the genome and accurately discover introns, especially with long reads. We introduce Magic-BLAST, a new aligner based on ideas f...

Descripción completa

Detalles Bibliográficos
Autores principales: Boratyn, Grzegorz M., Thierry-Mieg, Jean, Thierry-Mieg, Danielle, Busby, Ben, Madden, Thomas L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6659269/
https://www.ncbi.nlm.nih.gov/pubmed/31345161
http://dx.doi.org/10.1186/s12859-019-2996-x
_version_ 1783439101602562048
author Boratyn, Grzegorz M.
Thierry-Mieg, Jean
Thierry-Mieg, Danielle
Busby, Ben
Madden, Thomas L.
author_facet Boratyn, Grzegorz M.
Thierry-Mieg, Jean
Thierry-Mieg, Danielle
Busby, Ben
Madden, Thomas L.
author_sort Boratyn, Grzegorz M.
collection PubMed
description BACKGROUND: Next-generation sequencing technologies can produce tens of millions of reads, often paired-end, from transcripts or genomes. But few programs can align RNA on the genome and accurately discover introns, especially with long reads. We introduce Magic-BLAST, a new aligner based on ideas from the Magic pipeline. RESULTS: Magic-BLAST uses innovative techniques that include the optimization of a spliced alignment score and selective masking during seed selection. We evaluate the performance of Magic-BLAST to accurately map short or long sequences and its ability to discover introns on real RNA-seq data sets from PacBio, Roche and Illumina runs, and on six benchmarks, and compare it to other popular aligners. Additionally, we look at alignments of human idealized RefSeq mRNA sequences perfectly matching the genome. CONCLUSIONS: We show that Magic-BLAST is the best at intron discovery over a wide range of conditions and the best at mapping reads longer than 250 bases, from any platform. It is versatile and robust to high levels of mismatches or extreme base composition, and reasonably fast. It can align reads to a BLAST database or a FASTA file. It can accept a FASTQ file as input or automatically retrieve an accession from the SRA repository at the NCBI. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2996-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6659269
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66592692019-08-01 Magic-BLAST, an accurate RNA-seq aligner for long and short reads Boratyn, Grzegorz M. Thierry-Mieg, Jean Thierry-Mieg, Danielle Busby, Ben Madden, Thomas L. BMC Bioinformatics Software BACKGROUND: Next-generation sequencing technologies can produce tens of millions of reads, often paired-end, from transcripts or genomes. But few programs can align RNA on the genome and accurately discover introns, especially with long reads. We introduce Magic-BLAST, a new aligner based on ideas from the Magic pipeline. RESULTS: Magic-BLAST uses innovative techniques that include the optimization of a spliced alignment score and selective masking during seed selection. We evaluate the performance of Magic-BLAST to accurately map short or long sequences and its ability to discover introns on real RNA-seq data sets from PacBio, Roche and Illumina runs, and on six benchmarks, and compare it to other popular aligners. Additionally, we look at alignments of human idealized RefSeq mRNA sequences perfectly matching the genome. CONCLUSIONS: We show that Magic-BLAST is the best at intron discovery over a wide range of conditions and the best at mapping reads longer than 250 bases, from any platform. It is versatile and robust to high levels of mismatches or extreme base composition, and reasonably fast. It can align reads to a BLAST database or a FASTA file. It can accept a FASTQ file as input or automatically retrieve an accession from the SRA repository at the NCBI. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2996-x) contains supplementary material, which is available to authorized users. BioMed Central 2019-07-25 /pmc/articles/PMC6659269/ /pubmed/31345161 http://dx.doi.org/10.1186/s12859-019-2996-x Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Boratyn, Grzegorz M.
Thierry-Mieg, Jean
Thierry-Mieg, Danielle
Busby, Ben
Madden, Thomas L.
Magic-BLAST, an accurate RNA-seq aligner for long and short reads
title Magic-BLAST, an accurate RNA-seq aligner for long and short reads
title_full Magic-BLAST, an accurate RNA-seq aligner for long and short reads
title_fullStr Magic-BLAST, an accurate RNA-seq aligner for long and short reads
title_full_unstemmed Magic-BLAST, an accurate RNA-seq aligner for long and short reads
title_short Magic-BLAST, an accurate RNA-seq aligner for long and short reads
title_sort magic-blast, an accurate rna-seq aligner for long and short reads
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6659269/
https://www.ncbi.nlm.nih.gov/pubmed/31345161
http://dx.doi.org/10.1186/s12859-019-2996-x
work_keys_str_mv AT boratyngrzegorzm magicblastanaccuraternaseqalignerforlongandshortreads
AT thierrymiegjean magicblastanaccuraternaseqalignerforlongandshortreads
AT thierrymiegdanielle magicblastanaccuraternaseqalignerforlongandshortreads
AT busbyben magicblastanaccuraternaseqalignerforlongandshortreads
AT maddenthomasl magicblastanaccuraternaseqalignerforlongandshortreads