Cargando…

Optimal computation of all tandem repeats in a weighted sequence

BACKGROUND: Tandem duplication, in the context of molecular biology, occurs as a result of mutational events in which an original segment of DNA is converted into a sequence of individual copies. More formally, a repetition or tandem repeat in a string of letters consists of exact concatenations of...

Descripción completa

Detalles Bibliográficos
Autores principales: Barton, Carl, Iliopoulos, Costas S, Pissis, Solon P
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4152798/
https://www.ncbi.nlm.nih.gov/pubmed/25221616
http://dx.doi.org/10.1186/s13015-014-0021-5
Descripción
Sumario:BACKGROUND: Tandem duplication, in the context of molecular biology, occurs as a result of mutational events in which an original segment of DNA is converted into a sequence of individual copies. More formally, a repetition or tandem repeat in a string of letters consists of exact concatenations of identical factors of the string. Biologists are interested in approximate tandem repeats and not necessarily only in exact tandem repeats. A weighted sequence is a string in which a set of letters may occur at each position with respective probabilities of occurrence. It naturally arises in many biological contexts and provides a method to realise the approximation among distinct adjacent occurrences of the same DNA segment. RESULTS: Crochemore’s repetitions algorithm, also referred to as Crochemore’s partitioning algorithm, was introduced in 1981, and was the first optimal [Formula: see text]-time algorithm to compute all repetitions in a string of length n. In this article, we present a novel variant of Crochemore’s partitioning algorithm for weighted sequences, which requires optimal [Formula: see text] time, thus improving on the best known [Formula: see text]-time algorithm (Zhang et al., 2013) for computing all repetitions in a weighted sequence of length n.