Cargando…

uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts

BACKGROUND: Randomly shuffled sequences are routinely used in sequence analysis to evaluate the statistical significance of a biological sequence. In many cases, biologists need sophisticated shuffling tools that preserve not only the counts of distinct letters but also higher-order statistics such...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Minghui, Anderson, James, Gillespie, Joel, Mayne, Martin
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2375906/
https://www.ncbi.nlm.nih.gov/pubmed/18405375
http://dx.doi.org/10.1186/1471-2105-9-192
_version_ 1782154672091430912
author Jiang, Minghui
Anderson, James
Gillespie, Joel
Mayne, Martin
author_facet Jiang, Minghui
Anderson, James
Gillespie, Joel
Mayne, Martin
author_sort Jiang, Minghui
collection PubMed
description BACKGROUND: Randomly shuffled sequences are routinely used in sequence analysis to evaluate the statistical significance of a biological sequence. In many cases, biologists need sophisticated shuffling tools that preserve not only the counts of distinct letters but also higher-order statistics such as doublet counts, triplet counts, and, in general, k-let counts. RESULTS: We present a sequence analysis tool (named uShuffle) for generating uniform random permutations of biological sequences (such as DNAs, RNAs, and proteins) that preserve the exact k-let counts. The uShuffle tool implements the latest variant of the Euler algorithm and uses Wilson's algorithm in the crucial step of arborescence generation. It is carefully engineered and extremely efficient. The uShuffle tool achieves maximum flexibility by allowing arbitrary alphabet size and let size. It can be used as a command-line program, a web application, or a utility library. Source code in C, Java, and C#, and integration instructions for Perl and Python are provided. CONCLUSION: The uShuffle tool surpasses existing implementation of the Euler algorithm in both performance and flexibility. It is a useful tool for the bioinformatics community.
format Text
id pubmed-2375906
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23759062008-05-12 uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts Jiang, Minghui Anderson, James Gillespie, Joel Mayne, Martin BMC Bioinformatics Software BACKGROUND: Randomly shuffled sequences are routinely used in sequence analysis to evaluate the statistical significance of a biological sequence. In many cases, biologists need sophisticated shuffling tools that preserve not only the counts of distinct letters but also higher-order statistics such as doublet counts, triplet counts, and, in general, k-let counts. RESULTS: We present a sequence analysis tool (named uShuffle) for generating uniform random permutations of biological sequences (such as DNAs, RNAs, and proteins) that preserve the exact k-let counts. The uShuffle tool implements the latest variant of the Euler algorithm and uses Wilson's algorithm in the crucial step of arborescence generation. It is carefully engineered and extremely efficient. The uShuffle tool achieves maximum flexibility by allowing arbitrary alphabet size and let size. It can be used as a command-line program, a web application, or a utility library. Source code in C, Java, and C#, and integration instructions for Perl and Python are provided. CONCLUSION: The uShuffle tool surpasses existing implementation of the Euler algorithm in both performance and flexibility. It is a useful tool for the bioinformatics community. BioMed Central 2008-04-11 /pmc/articles/PMC2375906/ /pubmed/18405375 http://dx.doi.org/10.1186/1471-2105-9-192 Text en Copyright © 2008 Jiang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Jiang, Minghui
Anderson, James
Gillespie, Joel
Mayne, Martin
uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts
title uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts
title_full uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts
title_fullStr uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts
title_full_unstemmed uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts
title_short uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts
title_sort ushuffle: a useful tool for shuffling biological sequences while preserving the k-let counts
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2375906/
https://www.ncbi.nlm.nih.gov/pubmed/18405375
http://dx.doi.org/10.1186/1471-2105-9-192
work_keys_str_mv AT jiangminghui ushuffleausefultoolforshufflingbiologicalsequenceswhilepreservingthekletcounts
AT andersonjames ushuffleausefultoolforshufflingbiologicalsequenceswhilepreservingthekletcounts
AT gillespiejoel ushuffleausefultoolforshufflingbiologicalsequenceswhilepreservingthekletcounts
AT maynemartin ushuffleausefultoolforshufflingbiologicalsequenceswhilepreservingthekletcounts