Cargando…

Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs

BACKGROUND: Accurately modeling the sequence substitution process is required for the correct estimation of evolutionary parameters, be they phylogenetic relationships, substitution rates or ancestral states; it is also crucial to simulate realistic data sets. Such simulation procedures are needed t...

Descripción completa

Detalles Bibliográficos
Autores principales: Dutheil, Julien, Boussau, Bastien
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559849/
https://www.ncbi.nlm.nih.gov/pubmed/18808672
http://dx.doi.org/10.1186/1471-2148-8-255
_version_ 1782159679819874304
author Dutheil, Julien
Boussau, Bastien
author_facet Dutheil, Julien
Boussau, Bastien
author_sort Dutheil, Julien
collection PubMed
description BACKGROUND: Accurately modeling the sequence substitution process is required for the correct estimation of evolutionary parameters, be they phylogenetic relationships, substitution rates or ancestral states; it is also crucial to simulate realistic data sets. Such simulation procedures are needed to estimate the null-distribution of complex statistics, an approach referred to as parametric bootstrapping, and are also used to test the quality of phylogenetic reconstruction programs. It has often been observed that homologous sequences can vary widely in their nucleotide or amino-acid compositions, revealing that sequence evolution has changed importantly among lineages, and may therefore be most appropriately approached through non-homogeneous models. Several programs implementing such models have been developed, but they are limited in their possibilities: only a few particular models are available for likelihood optimization, and data sets cannot be easily generated using the resulting estimated parameters. RESULTS: We hereby present a general implementation of non-homogeneous models of substitutions. It is available as dedicated classes in the Bio++ libraries and can hence be used in any C++ program. Two programs that use these classes are also presented. The first one, Bio++ Maximum Likelihood (BppML), estimates parameters of any non-homogeneous model and the second one, Bio++ Sequence Generator (BppSeqGen), simulates the evolution of sequences from these models. These programs allow the user to describe non-homogeneous models through a property file with a simple yet powerful syntax, without any programming required. CONCLUSION: We show that the general implementation introduced here can accommodate virtually any type of non-homogeneous models of sequence evolution, including heterotachous ones, while being computer efficient. We furthermore illustrate the use of such general models for parametric bootstrapping, using tests of non-homogeneity applied to an already published ribosomal RNA data set.
format Text
id pubmed-2559849
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25598492008-10-03 Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs Dutheil, Julien Boussau, Bastien BMC Evol Biol Software BACKGROUND: Accurately modeling the sequence substitution process is required for the correct estimation of evolutionary parameters, be they phylogenetic relationships, substitution rates or ancestral states; it is also crucial to simulate realistic data sets. Such simulation procedures are needed to estimate the null-distribution of complex statistics, an approach referred to as parametric bootstrapping, and are also used to test the quality of phylogenetic reconstruction programs. It has often been observed that homologous sequences can vary widely in their nucleotide or amino-acid compositions, revealing that sequence evolution has changed importantly among lineages, and may therefore be most appropriately approached through non-homogeneous models. Several programs implementing such models have been developed, but they are limited in their possibilities: only a few particular models are available for likelihood optimization, and data sets cannot be easily generated using the resulting estimated parameters. RESULTS: We hereby present a general implementation of non-homogeneous models of substitutions. It is available as dedicated classes in the Bio++ libraries and can hence be used in any C++ program. Two programs that use these classes are also presented. The first one, Bio++ Maximum Likelihood (BppML), estimates parameters of any non-homogeneous model and the second one, Bio++ Sequence Generator (BppSeqGen), simulates the evolution of sequences from these models. These programs allow the user to describe non-homogeneous models through a property file with a simple yet powerful syntax, without any programming required. CONCLUSION: We show that the general implementation introduced here can accommodate virtually any type of non-homogeneous models of sequence evolution, including heterotachous ones, while being computer efficient. We furthermore illustrate the use of such general models for parametric bootstrapping, using tests of non-homogeneity applied to an already published ribosomal RNA data set. BioMed Central 2008-09-22 /pmc/articles/PMC2559849/ /pubmed/18808672 http://dx.doi.org/10.1186/1471-2148-8-255 Text en Copyright ©2008 Dutheil and Boussau; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Dutheil, Julien
Boussau, Bastien
Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs
title Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs
title_full Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs
title_fullStr Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs
title_full_unstemmed Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs
title_short Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs
title_sort non-homogeneous models of sequence evolution in the bio++ suite of libraries and programs
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559849/
https://www.ncbi.nlm.nih.gov/pubmed/18808672
http://dx.doi.org/10.1186/1471-2148-8-255
work_keys_str_mv AT dutheiljulien nonhomogeneousmodelsofsequenceevolutioninthebiosuiteoflibrariesandprograms
AT boussaubastien nonhomogeneousmodelsofsequenceevolutioninthebiosuiteoflibrariesandprograms