SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes

Motivation: In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily...

Descripción completa

Detalles Bibliográficos
Autores principales: Pruesse, Elmar, Peplies, Jörg, Glöckner, Frank Oliver
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3389763/
https://www.ncbi.nlm.nih.gov/pubmed/22556368
http://dx.doi.org/10.1093/bioinformatics/bts252
_version_ 1782237351453392896
author Pruesse, Elmar
Peplies, Jörg
Glöckner, Frank Oliver
author_facet Pruesse, Elmar
Peplies, Jörg
Glöckner, Frank Oliver
author_sort Pruesse, Elmar
collection PubMed
description Motivation: In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily produce hundreds of thousands of new sequences. Methods have been developed to cope with such numbers, but further improvements are needed to meet accuracy requirements. Results: In this study, we present the SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project. SINA uses a combination of k-mer searching and partial order alignment (POA) to maintain very high alignment accuracy while satisfying high throughput performance demands. SINA was evaluated in comparison with the commonly used high throughput MSA programs PyNAST and mothur. The three BRAliBase III benchmark MSAs could be reproduced with 99.3, 97.6 and 96.1 accuracy. A larger benchmark MSA comprising 38 772 sequences could be reproduced with 98.9 and 99.3% accuracy using reference MSAs comprising 1000 and 5000 sequences. SINA was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks. Availability: Alignment of up to 500 sequences using the latest SILVA SSU/LSU Ref datasets as reference MSA is offered at http://www.arb-silva.de/aligner. This page also links to Linux binaries, user manual and tutorial. SINA is made available under a personal use license. Contact: epruesse@mpi-bremen.de Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-3389763
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-33897632012-07-05 SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes Pruesse, Elmar Peplies, Jörg Glöckner, Frank Oliver Bioinformatics Original Papers Motivation: In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily produce hundreds of thousands of new sequences. Methods have been developed to cope with such numbers, but further improvements are needed to meet accuracy requirements. Results: In this study, we present the SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project. SINA uses a combination of k-mer searching and partial order alignment (POA) to maintain very high alignment accuracy while satisfying high throughput performance demands. SINA was evaluated in comparison with the commonly used high throughput MSA programs PyNAST and mothur. The three BRAliBase III benchmark MSAs could be reproduced with 99.3, 97.6 and 96.1 accuracy. A larger benchmark MSA comprising 38 772 sequences could be reproduced with 98.9 and 99.3% accuracy using reference MSAs comprising 1000 and 5000 sequences. SINA was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks. Availability: Alignment of up to 500 sequences using the latest SILVA SSU/LSU Ref datasets as reference MSA is offered at http://www.arb-silva.de/aligner. This page also links to Linux binaries, user manual and tutorial. SINA is made available under a personal use license. Contact: epruesse@mpi-bremen.de Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2012-07-15 2012-05-03 /pmc/articles/PMC3389763/ /pubmed/22556368 http://dx.doi.org/10.1093/bioinformatics/bts252 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Pruesse, Elmar
Peplies, Jörg
Glöckner, Frank Oliver
SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes
title SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes
title_full SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes
title_fullStr SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes
title_full_unstemmed SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes
title_short SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes
title_sort sina: accurate high-throughput multiple sequence alignment of ribosomal rna genes
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3389763/
https://www.ncbi.nlm.nih.gov/pubmed/22556368
http://dx.doi.org/10.1093/bioinformatics/bts252
work_keys_str_mv AT pruesseelmar sinaaccuratehighthroughputmultiplesequencealignmentofribosomalrnagenes
AT pepliesjorg sinaaccuratehighthroughputmultiplesequencealignmentofribosomalrnagenes
AT glocknerfrankoliver sinaaccuratehighthroughputmultiplesequencealignmentofribosomalrnagenes