Cargando…

Parameterizing sequence alignment with an explicit evolutionary model

BACKGROUND: Inference of sequence homology is inherently an evolutionary question, dependent upon evolutionary divergence. However, the insertion and deletion penalties in the most widely used methods for inferring homology by sequence alignment, including BLAST and profile hidden Markov models (pro...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rivas, Elena, Eddy, Sean R.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4676179/ https://www.ncbi.nlm.nih.gov/pubmed/26652060 http://dx.doi.org/10.1186/s12859-015-0832-5

_version_	1782405128735686656
author	Rivas, Elena Eddy, Sean R.
author_facet	Rivas, Elena Eddy, Sean R.
author_sort	Rivas, Elena
collection	PubMed
description	BACKGROUND: Inference of sequence homology is inherently an evolutionary question, dependent upon evolutionary divergence. However, the insertion and deletion penalties in the most widely used methods for inferring homology by sequence alignment, including BLAST and profile hidden Markov models (profile HMMs), are not based on any explicitly time-dependent evolutionary model. Using one fixed score system (BLOSUM62 with some gap open/extend costs, for example) corresponds to making an unrealistic assumption that all sequence relationships have diverged by the same time. Adoption of explicit time-dependent evolutionary models for scoring insertions and deletions in sequence alignments has been hindered by algorithmic complexity and technical difficulty. RESULTS: We identify and implement several probabilistic evolutionary models compatible with the affine-cost insertion/deletion model used in standard pairwise sequence alignment. Assuming an affine gap cost imposes important restrictions on the realism of the evolutionary models compatible with it, as single insertion events with geometrically distributed lengths do not result in geometrically distributed insert lengths at finite times. Nevertheless, we identify one evolutionary model compatible with symmetric pair HMMs that are the basis for Smith-Waterman pairwise alignment, and two evolutionary models compatible with standard profile-based alignment. We test different aspects of the performance of these “optimized branch length” models, including alignment accuracy and homology coverage (discrimination of residues in a homologous region from nonhomologous flanking residues). We test on benchmarks of both global homologies (full length sequence homologs) and local homologies (homologous subsequences embedded in nonhomologous sequence). CONCLUSIONS: Contrary to our expectations, we find that for global homologies a single long branch parameterization suffices both for distant and close homologous relationships. In contrast, we do see an advantage in using explicit evolutionary models for local homologies. Optimal branch parameterization reduces a known artifact called “homologous overextension”, in which local alignments erroneously extend through flanking nonhomologous residues. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0832-5) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4676179
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-46761792015-12-12 Parameterizing sequence alignment with an explicit evolutionary model Rivas, Elena Eddy, Sean R. BMC Bioinformatics Research Article BACKGROUND: Inference of sequence homology is inherently an evolutionary question, dependent upon evolutionary divergence. However, the insertion and deletion penalties in the most widely used methods for inferring homology by sequence alignment, including BLAST and profile hidden Markov models (profile HMMs), are not based on any explicitly time-dependent evolutionary model. Using one fixed score system (BLOSUM62 with some gap open/extend costs, for example) corresponds to making an unrealistic assumption that all sequence relationships have diverged by the same time. Adoption of explicit time-dependent evolutionary models for scoring insertions and deletions in sequence alignments has been hindered by algorithmic complexity and technical difficulty. RESULTS: We identify and implement several probabilistic evolutionary models compatible with the affine-cost insertion/deletion model used in standard pairwise sequence alignment. Assuming an affine gap cost imposes important restrictions on the realism of the evolutionary models compatible with it, as single insertion events with geometrically distributed lengths do not result in geometrically distributed insert lengths at finite times. Nevertheless, we identify one evolutionary model compatible with symmetric pair HMMs that are the basis for Smith-Waterman pairwise alignment, and two evolutionary models compatible with standard profile-based alignment. We test different aspects of the performance of these “optimized branch length” models, including alignment accuracy and homology coverage (discrimination of residues in a homologous region from nonhomologous flanking residues). We test on benchmarks of both global homologies (full length sequence homologs) and local homologies (homologous subsequences embedded in nonhomologous sequence). CONCLUSIONS: Contrary to our expectations, we find that for global homologies a single long branch parameterization suffices both for distant and close homologous relationships. In contrast, we do see an advantage in using explicit evolutionary models for local homologies. Optimal branch parameterization reduces a known artifact called “homologous overextension”, in which local alignments erroneously extend through flanking nonhomologous residues. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0832-5) contains supplementary material, which is available to authorized users. BioMed Central 2015-12-10 /pmc/articles/PMC4676179/ /pubmed/26652060 http://dx.doi.org/10.1186/s12859-015-0832-5 Text en © Rivas and Eddy. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Rivas, Elena Eddy, Sean R. Parameterizing sequence alignment with an explicit evolutionary model
title	Parameterizing sequence alignment with an explicit evolutionary model
title_full	Parameterizing sequence alignment with an explicit evolutionary model
title_fullStr	Parameterizing sequence alignment with an explicit evolutionary model
title_full_unstemmed	Parameterizing sequence alignment with an explicit evolutionary model
title_short	Parameterizing sequence alignment with an explicit evolutionary model
title_sort	parameterizing sequence alignment with an explicit evolutionary model
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4676179/ https://www.ncbi.nlm.nih.gov/pubmed/26652060 http://dx.doi.org/10.1186/s12859-015-0832-5
work_keys_str_mv	AT rivaselena parameterizingsequencealignmentwithanexplicitevolutionarymodel AT eddyseanr parameterizingsequencealignmentwithanexplicitevolutionarymodel

Parameterizing sequence alignment with an explicit evolutionary model

Ejemplares similares