Cargando…
A Rank-Based Sequence Aligner with Applications in Phylogenetic Analysis
Recent tools for aligning short DNA reads have been designed to optimize the trade-off between correctness and speed. This paper introduces a method for assigning a set of short DNA reads to a reference genome, under Local Rank Distance (LRD). The rank-based aligner proposed in this work aims to imp...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4136772/ https://www.ncbi.nlm.nih.gov/pubmed/25133391 http://dx.doi.org/10.1371/journal.pone.0104006 |
_version_ | 1782331021193117696 |
---|---|
author | Dinu, Liviu P. Ionescu, Radu Tudor Tomescu, Alexandru I. |
author_facet | Dinu, Liviu P. Ionescu, Radu Tudor Tomescu, Alexandru I. |
author_sort | Dinu, Liviu P. |
collection | PubMed |
description | Recent tools for aligning short DNA reads have been designed to optimize the trade-off between correctness and speed. This paper introduces a method for assigning a set of short DNA reads to a reference genome, under Local Rank Distance (LRD). The rank-based aligner proposed in this work aims to improve correctness over speed. However, some indexing strategies to speed up the aligner are also investigated. The LRD aligner is improved in terms of speed by storing [Image: see text]-mer positions in a hash table for each read. Another improvement, that produces an approximate LRD aligner, is to consider only the positions in the reference that are likely to represent a good positional match of the read. The proposed aligner is evaluated and compared to other state of the art alignment tools in several experiments. A set of experiments are conducted to determine the precision and the recall of the proposed aligner, in the presence of contaminated reads. In another set of experiments, the proposed aligner is used to find the order, the family, or the species of a new (or unknown) organism, given only a set of short Next-Generation Sequencing DNA reads. The empirical results show that the aligner proposed in this work is highly accurate from a biological point of view. Compared to the other evaluated tools, the LRD aligner has the important advantage of being very accurate even for a very low base coverage. Thus, the LRD aligner can be considered as a good alternative to standard alignment tools, especially when the accuracy of the aligner is of high importance. Source code and UNIX binaries of the aligner are freely available for future development and use at http://lrd.herokuapp.com/aligners. The software is implemented in C++ and Java, being supported on UNIX and MS Windows. |
format | Online Article Text |
id | pubmed-4136772 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-41367722014-08-20 A Rank-Based Sequence Aligner with Applications in Phylogenetic Analysis Dinu, Liviu P. Ionescu, Radu Tudor Tomescu, Alexandru I. PLoS One Research Article Recent tools for aligning short DNA reads have been designed to optimize the trade-off between correctness and speed. This paper introduces a method for assigning a set of short DNA reads to a reference genome, under Local Rank Distance (LRD). The rank-based aligner proposed in this work aims to improve correctness over speed. However, some indexing strategies to speed up the aligner are also investigated. The LRD aligner is improved in terms of speed by storing [Image: see text]-mer positions in a hash table for each read. Another improvement, that produces an approximate LRD aligner, is to consider only the positions in the reference that are likely to represent a good positional match of the read. The proposed aligner is evaluated and compared to other state of the art alignment tools in several experiments. A set of experiments are conducted to determine the precision and the recall of the proposed aligner, in the presence of contaminated reads. In another set of experiments, the proposed aligner is used to find the order, the family, or the species of a new (or unknown) organism, given only a set of short Next-Generation Sequencing DNA reads. The empirical results show that the aligner proposed in this work is highly accurate from a biological point of view. Compared to the other evaluated tools, the LRD aligner has the important advantage of being very accurate even for a very low base coverage. Thus, the LRD aligner can be considered as a good alternative to standard alignment tools, especially when the accuracy of the aligner is of high importance. Source code and UNIX binaries of the aligner are freely available for future development and use at http://lrd.herokuapp.com/aligners. The software is implemented in C++ and Java, being supported on UNIX and MS Windows. Public Library of Science 2014-08-18 /pmc/articles/PMC4136772/ /pubmed/25133391 http://dx.doi.org/10.1371/journal.pone.0104006 Text en © 2014 Dinu et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Dinu, Liviu P. Ionescu, Radu Tudor Tomescu, Alexandru I. A Rank-Based Sequence Aligner with Applications in Phylogenetic Analysis |
title | A Rank-Based Sequence Aligner with Applications in Phylogenetic Analysis |
title_full | A Rank-Based Sequence Aligner with Applications in Phylogenetic Analysis |
title_fullStr | A Rank-Based Sequence Aligner with Applications in Phylogenetic Analysis |
title_full_unstemmed | A Rank-Based Sequence Aligner with Applications in Phylogenetic Analysis |
title_short | A Rank-Based Sequence Aligner with Applications in Phylogenetic Analysis |
title_sort | rank-based sequence aligner with applications in phylogenetic analysis |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4136772/ https://www.ncbi.nlm.nih.gov/pubmed/25133391 http://dx.doi.org/10.1371/journal.pone.0104006 |
work_keys_str_mv | AT dinuliviup arankbasedsequencealignerwithapplicationsinphylogeneticanalysis AT ionescuradutudor arankbasedsequencealignerwithapplicationsinphylogeneticanalysis AT tomescualexandrui arankbasedsequencealignerwithapplicationsinphylogeneticanalysis AT dinuliviup rankbasedsequencealignerwithapplicationsinphylogeneticanalysis AT ionescuradutudor rankbasedsequencealignerwithapplicationsinphylogeneticanalysis AT tomescualexandrui rankbasedsequencealignerwithapplicationsinphylogeneticanalysis |