Cargando…

A Rank-Based Sequence Aligner with Applications in Phylogenetic Analysis

Recent tools for aligning short DNA reads have been designed to optimize the trade-off between correctness and speed. This paper introduces a method for assigning a set of short DNA reads to a reference genome, under Local Rank Distance (LRD). The rank-based aligner proposed in this work aims to imp...

Descripción completa

Detalles Bibliográficos
Autores principales: Dinu, Liviu P., Ionescu, Radu Tudor, Tomescu, Alexandru I.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4136772/
https://www.ncbi.nlm.nih.gov/pubmed/25133391
http://dx.doi.org/10.1371/journal.pone.0104006
_version_ 1782331021193117696
author Dinu, Liviu P.
Ionescu, Radu Tudor
Tomescu, Alexandru I.
author_facet Dinu, Liviu P.
Ionescu, Radu Tudor
Tomescu, Alexandru I.
author_sort Dinu, Liviu P.
collection PubMed
description Recent tools for aligning short DNA reads have been designed to optimize the trade-off between correctness and speed. This paper introduces a method for assigning a set of short DNA reads to a reference genome, under Local Rank Distance (LRD). The rank-based aligner proposed in this work aims to improve correctness over speed. However, some indexing strategies to speed up the aligner are also investigated. The LRD aligner is improved in terms of speed by storing [Image: see text]-mer positions in a hash table for each read. Another improvement, that produces an approximate LRD aligner, is to consider only the positions in the reference that are likely to represent a good positional match of the read. The proposed aligner is evaluated and compared to other state of the art alignment tools in several experiments. A set of experiments are conducted to determine the precision and the recall of the proposed aligner, in the presence of contaminated reads. In another set of experiments, the proposed aligner is used to find the order, the family, or the species of a new (or unknown) organism, given only a set of short Next-Generation Sequencing DNA reads. The empirical results show that the aligner proposed in this work is highly accurate from a biological point of view. Compared to the other evaluated tools, the LRD aligner has the important advantage of being very accurate even for a very low base coverage. Thus, the LRD aligner can be considered as a good alternative to standard alignment tools, especially when the accuracy of the aligner is of high importance. Source code and UNIX binaries of the aligner are freely available for future development and use at http://lrd.herokuapp.com/aligners. The software is implemented in C++ and Java, being supported on UNIX and MS Windows.
format Online
Article
Text
id pubmed-4136772
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-41367722014-08-20 A Rank-Based Sequence Aligner with Applications in Phylogenetic Analysis Dinu, Liviu P. Ionescu, Radu Tudor Tomescu, Alexandru I. PLoS One Research Article Recent tools for aligning short DNA reads have been designed to optimize the trade-off between correctness and speed. This paper introduces a method for assigning a set of short DNA reads to a reference genome, under Local Rank Distance (LRD). The rank-based aligner proposed in this work aims to improve correctness over speed. However, some indexing strategies to speed up the aligner are also investigated. The LRD aligner is improved in terms of speed by storing [Image: see text]-mer positions in a hash table for each read. Another improvement, that produces an approximate LRD aligner, is to consider only the positions in the reference that are likely to represent a good positional match of the read. The proposed aligner is evaluated and compared to other state of the art alignment tools in several experiments. A set of experiments are conducted to determine the precision and the recall of the proposed aligner, in the presence of contaminated reads. In another set of experiments, the proposed aligner is used to find the order, the family, or the species of a new (or unknown) organism, given only a set of short Next-Generation Sequencing DNA reads. The empirical results show that the aligner proposed in this work is highly accurate from a biological point of view. Compared to the other evaluated tools, the LRD aligner has the important advantage of being very accurate even for a very low base coverage. Thus, the LRD aligner can be considered as a good alternative to standard alignment tools, especially when the accuracy of the aligner is of high importance. Source code and UNIX binaries of the aligner are freely available for future development and use at http://lrd.herokuapp.com/aligners. The software is implemented in C++ and Java, being supported on UNIX and MS Windows. Public Library of Science 2014-08-18 /pmc/articles/PMC4136772/ /pubmed/25133391 http://dx.doi.org/10.1371/journal.pone.0104006 Text en © 2014 Dinu et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Dinu, Liviu P.
Ionescu, Radu Tudor
Tomescu, Alexandru I.
A Rank-Based Sequence Aligner with Applications in Phylogenetic Analysis
title A Rank-Based Sequence Aligner with Applications in Phylogenetic Analysis
title_full A Rank-Based Sequence Aligner with Applications in Phylogenetic Analysis
title_fullStr A Rank-Based Sequence Aligner with Applications in Phylogenetic Analysis
title_full_unstemmed A Rank-Based Sequence Aligner with Applications in Phylogenetic Analysis
title_short A Rank-Based Sequence Aligner with Applications in Phylogenetic Analysis
title_sort rank-based sequence aligner with applications in phylogenetic analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4136772/
https://www.ncbi.nlm.nih.gov/pubmed/25133391
http://dx.doi.org/10.1371/journal.pone.0104006
work_keys_str_mv AT dinuliviup arankbasedsequencealignerwithapplicationsinphylogeneticanalysis
AT ionescuradutudor arankbasedsequencealignerwithapplicationsinphylogeneticanalysis
AT tomescualexandrui arankbasedsequencealignerwithapplicationsinphylogeneticanalysis
AT dinuliviup rankbasedsequencealignerwithapplicationsinphylogeneticanalysis
AT ionescuradutudor rankbasedsequencealignerwithapplicationsinphylogeneticanalysis
AT tomescualexandrui rankbasedsequencealignerwithapplicationsinphylogeneticanalysis