Cargando…

MSAIndelFR: a scheme for multiple protein sequence alignment using information on indel flanking regions

BACKGROUND: The alignment of multiple protein sequences is one of the most commonly performed tasks in bioinformatics. In spite of considerable research and efforts that have been recently deployed for improving the performance of multiple sequence alignment (MSA) algorithms, finding a highly accura...

Descripción completa

Detalles Bibliográficos
Autores principales: Al-Shatnawi, Mufleh, Ahmad, M. Omair, S. Swamy, M. N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4657235/
https://www.ncbi.nlm.nih.gov/pubmed/26597571
http://dx.doi.org/10.1186/s12859-015-0826-3
_version_ 1782402359443324928
author Al-Shatnawi, Mufleh
Ahmad, M. Omair
S. Swamy, M. N.
author_facet Al-Shatnawi, Mufleh
Ahmad, M. Omair
S. Swamy, M. N.
author_sort Al-Shatnawi, Mufleh
collection PubMed
description BACKGROUND: The alignment of multiple protein sequences is one of the most commonly performed tasks in bioinformatics. In spite of considerable research and efforts that have been recently deployed for improving the performance of multiple sequence alignment (MSA) algorithms, finding a highly accurate alignment between multiple protein sequences is still a challenging problem. RESULTS: We propose a novel and efficient algorithm called, MSAIndelFR, for multiple sequence alignment using the information on the predicted locations of IndelFRs and the computed average log–loss values obtained from IndelFR predictors, each of which is designed for a different protein fold. We demonstrate that the introduction of a new variable gap penalty function based on the predicted locations of the IndelFRs and the computed average log–loss values into the proposed algorithm substantially improves the protein alignment accuracy. This is illustrated by evaluating the performance of the algorithm in aligning sequences belonging to the protein folds for which the IndelFR predictors already exist and by using the reference alignments of the four popular benchmarks, BAliBASE 3.0, OXBENCH, PREFAB 4.0, and SABRE (SABmark 1.65). CONCLUSIONS: We have proposed a novel and efficient algorithm, the MSAIndelFR algorithm, for multiple protein sequence alignment incorporating a new variable gap penalty function. It is shown that the performance of the proposed algorithm is superior to that of the most–widely used alignment algorithms, Clustal W2, Clustal Omega, Kalign2, MSAProbs, MAFFT, MUSCLE, ProbCons and Probalign, in terms of both the sum–of–pairs and total column metrics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0826-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4657235
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46572352015-11-25 MSAIndelFR: a scheme for multiple protein sequence alignment using information on indel flanking regions Al-Shatnawi, Mufleh Ahmad, M. Omair S. Swamy, M. N. BMC Bioinformatics Research Article BACKGROUND: The alignment of multiple protein sequences is one of the most commonly performed tasks in bioinformatics. In spite of considerable research and efforts that have been recently deployed for improving the performance of multiple sequence alignment (MSA) algorithms, finding a highly accurate alignment between multiple protein sequences is still a challenging problem. RESULTS: We propose a novel and efficient algorithm called, MSAIndelFR, for multiple sequence alignment using the information on the predicted locations of IndelFRs and the computed average log–loss values obtained from IndelFR predictors, each of which is designed for a different protein fold. We demonstrate that the introduction of a new variable gap penalty function based on the predicted locations of the IndelFRs and the computed average log–loss values into the proposed algorithm substantially improves the protein alignment accuracy. This is illustrated by evaluating the performance of the algorithm in aligning sequences belonging to the protein folds for which the IndelFR predictors already exist and by using the reference alignments of the four popular benchmarks, BAliBASE 3.0, OXBENCH, PREFAB 4.0, and SABRE (SABmark 1.65). CONCLUSIONS: We have proposed a novel and efficient algorithm, the MSAIndelFR algorithm, for multiple protein sequence alignment incorporating a new variable gap penalty function. It is shown that the performance of the proposed algorithm is superior to that of the most–widely used alignment algorithms, Clustal W2, Clustal Omega, Kalign2, MSAProbs, MAFFT, MUSCLE, ProbCons and Probalign, in terms of both the sum–of–pairs and total column metrics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0826-3) contains supplementary material, which is available to authorized users. BioMed Central 2015-11-23 /pmc/articles/PMC4657235/ /pubmed/26597571 http://dx.doi.org/10.1186/s12859-015-0826-3 Text en © Al-Shatnawi et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Al-Shatnawi, Mufleh
Ahmad, M. Omair
S. Swamy, M. N.
MSAIndelFR: a scheme for multiple protein sequence alignment using information on indel flanking regions
title MSAIndelFR: a scheme for multiple protein sequence alignment using information on indel flanking regions
title_full MSAIndelFR: a scheme for multiple protein sequence alignment using information on indel flanking regions
title_fullStr MSAIndelFR: a scheme for multiple protein sequence alignment using information on indel flanking regions
title_full_unstemmed MSAIndelFR: a scheme for multiple protein sequence alignment using information on indel flanking regions
title_short MSAIndelFR: a scheme for multiple protein sequence alignment using information on indel flanking regions
title_sort msaindelfr: a scheme for multiple protein sequence alignment using information on indel flanking regions
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4657235/
https://www.ncbi.nlm.nih.gov/pubmed/26597571
http://dx.doi.org/10.1186/s12859-015-0826-3
work_keys_str_mv AT alshatnawimufleh msaindelfraschemeformultipleproteinsequencealignmentusinginformationonindelflankingregions
AT ahmadmomair msaindelfraschemeformultipleproteinsequencealignmentusinginformationonindelflankingregions
AT sswamymn msaindelfraschemeformultipleproteinsequencealignmentusinginformationonindelflankingregions