Cargando…

A High-Throughput DNA Sequence Aligner for Microbial Ecology Studies

As the scope of microbial surveys expands with the parallel growth in sequencing capacity, a significant bottleneck in data analysis is the ability to generate a biologically meaningful multiple sequence alignment. The most commonly used aligners have varying alignment quality and speed, tend to dep...

Descripción completa

Detalles Bibliográficos
Autor principal: Schloss, Patrick D.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2788221/
https://www.ncbi.nlm.nih.gov/pubmed/20011594
http://dx.doi.org/10.1371/journal.pone.0008230
_version_ 1782174948066852864
author Schloss, Patrick D.
author_facet Schloss, Patrick D.
author_sort Schloss, Patrick D.
collection PubMed
description As the scope of microbial surveys expands with the parallel growth in sequencing capacity, a significant bottleneck in data analysis is the ability to generate a biologically meaningful multiple sequence alignment. The most commonly used aligners have varying alignment quality and speed, tend to depend on a specific reference alignment, or lack a complete description of the underlying algorithm. The purpose of this study was to create and validate an aligner with the goal of quickly generating a high quality alignment and having the flexibility to use any reference alignment. Using the simple nearest alignment space termination algorithm, the resulting aligner operates in linear time, requires a small memory footprint, and generates a high quality alignment. In addition, the alignments generated for variable regions were of as high a quality as the alignment of full-length sequences. As implemented, the method was able to align 18 full-length 16S rRNA gene sequences and 58 V2 region sequences per second to the 50,000-column SILVA reference alignment. Most importantly, the resulting alignments were of a quality equal to SILVA-generated alignments. The aligner described in this study will enable scientists to rapidly generate robust multiple sequences alignments that are implicitly based upon the predicted secondary structure of the 16S rRNA molecule. Furthermore, because the implementation is not connected to a specific database it is easy to generalize the method to reference alignments for any DNA sequence.
format Text
id pubmed-2788221
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-27882212009-12-14 A High-Throughput DNA Sequence Aligner for Microbial Ecology Studies Schloss, Patrick D. PLoS One Research Article As the scope of microbial surveys expands with the parallel growth in sequencing capacity, a significant bottleneck in data analysis is the ability to generate a biologically meaningful multiple sequence alignment. The most commonly used aligners have varying alignment quality and speed, tend to depend on a specific reference alignment, or lack a complete description of the underlying algorithm. The purpose of this study was to create and validate an aligner with the goal of quickly generating a high quality alignment and having the flexibility to use any reference alignment. Using the simple nearest alignment space termination algorithm, the resulting aligner operates in linear time, requires a small memory footprint, and generates a high quality alignment. In addition, the alignments generated for variable regions were of as high a quality as the alignment of full-length sequences. As implemented, the method was able to align 18 full-length 16S rRNA gene sequences and 58 V2 region sequences per second to the 50,000-column SILVA reference alignment. Most importantly, the resulting alignments were of a quality equal to SILVA-generated alignments. The aligner described in this study will enable scientists to rapidly generate robust multiple sequences alignments that are implicitly based upon the predicted secondary structure of the 16S rRNA molecule. Furthermore, because the implementation is not connected to a specific database it is easy to generalize the method to reference alignments for any DNA sequence. Public Library of Science 2009-12-14 /pmc/articles/PMC2788221/ /pubmed/20011594 http://dx.doi.org/10.1371/journal.pone.0008230 Text en Schloss. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Schloss, Patrick D.
A High-Throughput DNA Sequence Aligner for Microbial Ecology Studies
title A High-Throughput DNA Sequence Aligner for Microbial Ecology Studies
title_full A High-Throughput DNA Sequence Aligner for Microbial Ecology Studies
title_fullStr A High-Throughput DNA Sequence Aligner for Microbial Ecology Studies
title_full_unstemmed A High-Throughput DNA Sequence Aligner for Microbial Ecology Studies
title_short A High-Throughput DNA Sequence Aligner for Microbial Ecology Studies
title_sort high-throughput dna sequence aligner for microbial ecology studies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2788221/
https://www.ncbi.nlm.nih.gov/pubmed/20011594
http://dx.doi.org/10.1371/journal.pone.0008230
work_keys_str_mv AT schlosspatrickd ahighthroughputdnasequencealignerformicrobialecologystudies
AT schlosspatrickd highthroughputdnasequencealignerformicrobialecologystudies