Cargando…

Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix

BACKGROUND: Selective pressures at the DNA level shape genes into profiles consisting of patterns of rapidly evolving sites and sites withstanding change. These profiles remain detectable even when protein sequences become extensively diverged. A common task in molecular biology is to infer function...

Descripción completa

Detalles Bibliográficos
Autores principales: Ndhlovu, Andrew, Hazelhurst, Scott, Durand, Pierre M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4535666/
https://www.ncbi.nlm.nih.gov/pubmed/26269100
http://dx.doi.org/10.1186/s12859-015-0688-8
_version_ 1782385635228647424
author Ndhlovu, Andrew
Hazelhurst, Scott
Durand, Pierre M.
author_facet Ndhlovu, Andrew
Hazelhurst, Scott
Durand, Pierre M.
author_sort Ndhlovu, Andrew
collection PubMed
description BACKGROUND: Selective pressures at the DNA level shape genes into profiles consisting of patterns of rapidly evolving sites and sites withstanding change. These profiles remain detectable even when protein sequences become extensively diverged. A common task in molecular biology is to infer functional, structural or evolutionary relationships by querying a database using an algorithm. However, problems arise when sequence similarity is low. This study presents an algorithm that uses the evolutionary rate at codon sites, the dN/dS (ω) parameter, coupled to a substitution matrix as an alignment metric for detecting distantly related proteins. The algorithm, called BLOSUM-FIRE couples a newer and improved version of the original FIRE (Functional Inference using Rates of Evolution) algorithm with an amino acid substitution matrix in a dynamic scoring function. The enigmatic hepatitis B virus X protein was used as a test case for BLOSUM-FIRE and its associated database EvoDB. RESULTS: The evolutionary rate based approach was coupled with a conventional BLOSUM substitution matrix. The two approaches are combined in a dynamic scoring function, which uses the selective pressure to score aligned residues. The dynamic scoring function is based on a coupled additive approach that scores aligned sites based on the level of conservation inferred from the ω values. Evaluation of the accuracy of this new implementation, BLOSUM-FIRE, using MAFFT alignment as reference alignments has shown that it is more accurate than its predecessor FIRE. Comparison of the alignment quality with widely used algorithms (MUSCLE, T-COFFEE, and CLUSTAL Omega) revealed that the BLOSUM-FIRE algorithm performs as well as conventional algorithms. Its main strength lies in that it provides greater potential for aligning divergent sequences and addresses the problem of low specificity inherent in the original FIRE algorithm. The utility of this algorithm is demonstrated using the Hepatitis B virus X (HBx) protein, a protein of unknown function, as a test case. CONCLUSION: This study describes the utility of an evolutionary rate based approach coupled to the BLOSUM62 amino acid substitution matrix in inferring protein domain function. We demonstrate that such an approach is robust and performs as well as an array of conventional algorithms.
format Online
Article
Text
id pubmed-4535666
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45356662015-08-14 Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix Ndhlovu, Andrew Hazelhurst, Scott Durand, Pierre M. BMC Bioinformatics Research Article BACKGROUND: Selective pressures at the DNA level shape genes into profiles consisting of patterns of rapidly evolving sites and sites withstanding change. These profiles remain detectable even when protein sequences become extensively diverged. A common task in molecular biology is to infer functional, structural or evolutionary relationships by querying a database using an algorithm. However, problems arise when sequence similarity is low. This study presents an algorithm that uses the evolutionary rate at codon sites, the dN/dS (ω) parameter, coupled to a substitution matrix as an alignment metric for detecting distantly related proteins. The algorithm, called BLOSUM-FIRE couples a newer and improved version of the original FIRE (Functional Inference using Rates of Evolution) algorithm with an amino acid substitution matrix in a dynamic scoring function. The enigmatic hepatitis B virus X protein was used as a test case for BLOSUM-FIRE and its associated database EvoDB. RESULTS: The evolutionary rate based approach was coupled with a conventional BLOSUM substitution matrix. The two approaches are combined in a dynamic scoring function, which uses the selective pressure to score aligned residues. The dynamic scoring function is based on a coupled additive approach that scores aligned sites based on the level of conservation inferred from the ω values. Evaluation of the accuracy of this new implementation, BLOSUM-FIRE, using MAFFT alignment as reference alignments has shown that it is more accurate than its predecessor FIRE. Comparison of the alignment quality with widely used algorithms (MUSCLE, T-COFFEE, and CLUSTAL Omega) revealed that the BLOSUM-FIRE algorithm performs as well as conventional algorithms. Its main strength lies in that it provides greater potential for aligning divergent sequences and addresses the problem of low specificity inherent in the original FIRE algorithm. The utility of this algorithm is demonstrated using the Hepatitis B virus X (HBx) protein, a protein of unknown function, as a test case. CONCLUSION: This study describes the utility of an evolutionary rate based approach coupled to the BLOSUM62 amino acid substitution matrix in inferring protein domain function. We demonstrate that such an approach is robust and performs as well as an array of conventional algorithms. BioMed Central 2015-08-14 /pmc/articles/PMC4535666/ /pubmed/26269100 http://dx.doi.org/10.1186/s12859-015-0688-8 Text en © Ndhlovu et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Ndhlovu, Andrew
Hazelhurst, Scott
Durand, Pierre M.
Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix
title Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix
title_full Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix
title_fullStr Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix
title_full_unstemmed Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix
title_short Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix
title_sort robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4535666/
https://www.ncbi.nlm.nih.gov/pubmed/26269100
http://dx.doi.org/10.1186/s12859-015-0688-8
work_keys_str_mv AT ndhlovuandrew robustsequencealignmentusingevolutionaryratescoupledwithanaminoacidsubstitutionmatrix
AT hazelhurstscott robustsequencealignmentusingevolutionaryratescoupledwithanaminoacidsubstitutionmatrix
AT durandpierrem robustsequencealignmentusingevolutionaryratescoupledwithanaminoacidsubstitutionmatrix