Cargando…

Closest string with outliers

BACKGROUND: Given n strings s(1), …, s(n) each of length ℓ and a nonnegative integer d, the CLOSEST STRING problem asks to find a center string s such that none of the input strings has Hamming distance greater than d from s. Finding a common pattern in many – but not necessarily all – input strings...

Descripción completa

Detalles Bibliográficos
Autores principales: Boucher, Christina, Ma, Bin
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3044313/
https://www.ncbi.nlm.nih.gov/pubmed/21342588
http://dx.doi.org/10.1186/1471-2105-12-S1-S55
_version_ 1782198718309597184
author Boucher, Christina
Ma, Bin
author_facet Boucher, Christina
Ma, Bin
author_sort Boucher, Christina
collection PubMed
description BACKGROUND: Given n strings s(1), …, s(n) each of length ℓ and a nonnegative integer d, the CLOSEST STRING problem asks to find a center string s such that none of the input strings has Hamming distance greater than d from s. Finding a common pattern in many – but not necessarily all – input strings is an important task that plays a role in many applications in bioinformatics. RESULTS: Although the closest string model is robust to the oversampling of strings in the input, it is severely affected by the existence of outliers. We propose a refined model, the CLOSEST STRING WITH OUTLIERS (CSWO) problem, to overcome this limitation. This new model asks for a center string s that is within Hamming distance d to at least n – k of the n input strings, where k is a parameter describing the maximum number of outliers. A CSWO solution not only provides the center string as a representative for the set of strings but also reveals the outliers of the set. We provide fixed parameter algorithms for CSWO when d and k are parameters, for both bounded and unbounded alphabets. We also show that when the alphabet is unbounded the problem is W[1]-hard with respect to n – k, ℓ, and d. CONCLUSIONS: Our refined model abstractly models finding common patterns in several but not all input strings. We initialize the study of the computability of this model and show that it is sensitive to different parameterizations. Lastly, we conclude by suggesting several open problems which warrant further investigation.
format Text
id pubmed-3044313
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30443132011-02-25 Closest string with outliers Boucher, Christina Ma, Bin BMC Bioinformatics Research BACKGROUND: Given n strings s(1), …, s(n) each of length ℓ and a nonnegative integer d, the CLOSEST STRING problem asks to find a center string s such that none of the input strings has Hamming distance greater than d from s. Finding a common pattern in many – but not necessarily all – input strings is an important task that plays a role in many applications in bioinformatics. RESULTS: Although the closest string model is robust to the oversampling of strings in the input, it is severely affected by the existence of outliers. We propose a refined model, the CLOSEST STRING WITH OUTLIERS (CSWO) problem, to overcome this limitation. This new model asks for a center string s that is within Hamming distance d to at least n – k of the n input strings, where k is a parameter describing the maximum number of outliers. A CSWO solution not only provides the center string as a representative for the set of strings but also reveals the outliers of the set. We provide fixed parameter algorithms for CSWO when d and k are parameters, for both bounded and unbounded alphabets. We also show that when the alphabet is unbounded the problem is W[1]-hard with respect to n – k, ℓ, and d. CONCLUSIONS: Our refined model abstractly models finding common patterns in several but not all input strings. We initialize the study of the computability of this model and show that it is sensitive to different parameterizations. Lastly, we conclude by suggesting several open problems which warrant further investigation. BioMed Central 2011-02-15 /pmc/articles/PMC3044313/ /pubmed/21342588 http://dx.doi.org/10.1186/1471-2105-12-S1-S55 Text en Copyright ©2011 Boucher and Ma; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Boucher, Christina
Ma, Bin
Closest string with outliers
title Closest string with outliers
title_full Closest string with outliers
title_fullStr Closest string with outliers
title_full_unstemmed Closest string with outliers
title_short Closest string with outliers
title_sort closest string with outliers
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3044313/
https://www.ncbi.nlm.nih.gov/pubmed/21342588
http://dx.doi.org/10.1186/1471-2105-12-S1-S55
work_keys_str_mv AT boucherchristina closeststringwithoutliers
AT mabin closeststringwithoutliers