Cargando…

Heuristics for multiobjective multiple sequence alignment

BACKGROUND: Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only...

Descripción completa

Detalles Bibliográficos
Autores principales: Abbasi, Maryam, Paquete, Luís, Pereira, Francisco B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4959375/
https://www.ncbi.nlm.nih.gov/pubmed/27454115
http://dx.doi.org/10.1186/s12938-016-0184-z
_version_ 1782444393898180608
author Abbasi, Maryam
Paquete, Luís
Pereira, Francisco B.
author_facet Abbasi, Maryam
Paquete, Luís
Pereira, Francisco B.
author_sort Abbasi, Maryam
collection PubMed
description BACKGROUND: Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplistic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment. METHODS: We introduce several heuristic approaches, based on local search procedures, that compute a set of sequence alignments, which are representative of the trade-off between the two objectives (substitution score and indels). Several algorithm design options are discussed and analysed, with particular emphasis on the influence of the starting alignment and neighborhood search definitions on the overall performance. A perturbation technique is proposed to improve the local search, which provides a wide range of high-quality alignments. RESULTS AND CONCLUSIONS: The proposed approach is tested experimentally on a wide range of instances. We performed several experiments with sequences obtained from the benchmark database BAliBASE 3.0. To evaluate the quality of the results, we calculate the hypervolume indicator of the set of score vectors returned by the algorithms. The results obtained allow us to identify reasonably good choices of parameters for our approach. Further, we compared our method in terms of correctly aligned pairs ratio and columns correctly aligned ratio with respect to reference alignments. Experimental results show that our approaches can obtain better results than TCoffee and Clustal Omega in terms of the first ratio.
format Online
Article
Text
id pubmed-4959375
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49593752016-08-01 Heuristics for multiobjective multiple sequence alignment Abbasi, Maryam Paquete, Luís Pereira, Francisco B. Biomed Eng Online Research BACKGROUND: Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplistic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment. METHODS: We introduce several heuristic approaches, based on local search procedures, that compute a set of sequence alignments, which are representative of the trade-off between the two objectives (substitution score and indels). Several algorithm design options are discussed and analysed, with particular emphasis on the influence of the starting alignment and neighborhood search definitions on the overall performance. A perturbation technique is proposed to improve the local search, which provides a wide range of high-quality alignments. RESULTS AND CONCLUSIONS: The proposed approach is tested experimentally on a wide range of instances. We performed several experiments with sequences obtained from the benchmark database BAliBASE 3.0. To evaluate the quality of the results, we calculate the hypervolume indicator of the set of score vectors returned by the algorithms. The results obtained allow us to identify reasonably good choices of parameters for our approach. Further, we compared our method in terms of correctly aligned pairs ratio and columns correctly aligned ratio with respect to reference alignments. Experimental results show that our approaches can obtain better results than TCoffee and Clustal Omega in terms of the first ratio. BioMed Central 2016-07-15 /pmc/articles/PMC4959375/ /pubmed/27454115 http://dx.doi.org/10.1186/s12938-016-0184-z Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Abbasi, Maryam
Paquete, Luís
Pereira, Francisco B.
Heuristics for multiobjective multiple sequence alignment
title Heuristics for multiobjective multiple sequence alignment
title_full Heuristics for multiobjective multiple sequence alignment
title_fullStr Heuristics for multiobjective multiple sequence alignment
title_full_unstemmed Heuristics for multiobjective multiple sequence alignment
title_short Heuristics for multiobjective multiple sequence alignment
title_sort heuristics for multiobjective multiple sequence alignment
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4959375/
https://www.ncbi.nlm.nih.gov/pubmed/27454115
http://dx.doi.org/10.1186/s12938-016-0184-z
work_keys_str_mv AT abbasimaryam heuristicsformultiobjectivemultiplesequencealignment
AT paqueteluis heuristicsformultiobjectivemultiplesequencealignment
AT pereirafranciscob heuristicsformultiobjectivemultiplesequencealignment