Cargando…

PyEvolve: a toolkit for statistical modelling of molecular evolution

BACKGROUND: Examining the distribution of variation has proven an extremely profitable technique in the effort to identify sequences of biological significance. Most approaches in the field, however, evaluate only the conserved portions of sequences – ignoring the biological significance of sequence...

Descripción completa

Detalles Bibliográficos
Autores principales:	Butterfield, Andrew, Vedagiri, Vivek, Lang, Edward, Lawrence, Cath, Wakefield, Matthew J, Isaev, Alexander, Huttley, Gavin A
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2004
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC317364/ https://www.ncbi.nlm.nih.gov/pubmed/14706121 http://dx.doi.org/10.1186/1471-2105-5-1

_version_	1782121152670334976
author	Butterfield, Andrew Vedagiri, Vivek Lang, Edward Lawrence, Cath Wakefield, Matthew J Isaev, Alexander Huttley, Gavin A
author_facet	Butterfield, Andrew Vedagiri, Vivek Lang, Edward Lawrence, Cath Wakefield, Matthew J Isaev, Alexander Huttley, Gavin A
author_sort	Butterfield, Andrew
collection	PubMed
description	BACKGROUND: Examining the distribution of variation has proven an extremely profitable technique in the effort to identify sequences of biological significance. Most approaches in the field, however, evaluate only the conserved portions of sequences – ignoring the biological significance of sequence differences. A suite of sophisticated likelihood based statistical models from the field of molecular evolution provides the basis for extracting the information from the full distribution of sequence variation. The number of different problems to which phylogeny-based maximum likelihood calculations can be applied is extensive. Available software packages that can perform likelihood calculations suffer from a lack of flexibility and scalability, or employ error-prone approaches to model parameterisation. RESULTS: Here we describe the implementation of PyEvolve, a toolkit for the application of existing, and development of new, statistical methods for molecular evolution. We present the object architecture and design schema of PyEvolve, which includes an adaptable multi-level parallelisation schema. The approach for defining new methods is illustrated by implementing a novel dinucleotide model of substitution that includes a parameter for mutation of methylated CpG's, which required 8 lines of standard Python code to define. Benchmarking was performed using either a dinucleotide or codon substitution model applied to an alignment of BRCA1 sequences from 20 mammals, or a 10 species subset. Up to five-fold parallel performance gains over serial were recorded. Compared to leading alternative software, PyEvolve exhibited significantly better real world performance for parameter rich models with a large data set, reducing the time required for optimisation from ~10 days to ~6 hours. CONCLUSION: PyEvolve provides flexible functionality that can be used either for statistical modelling of molecular evolution, or the development of new methods in the field. The toolkit can be used interactively or by writing and executing scripts. The toolkit uses efficient processes for specifying the parameterisation of statistical models, and implements numerous optimisations that make highly parameter rich likelihood functions solvable within hours on multi-cpu hardware. PyEvolve can be readily adapted in response to changing computational demands and hardware configurations to maximise performance. PyEvolve is released under the GPL and can be downloaded from .
format	Text
id	pubmed-317364
institution	National Center for Biotechnology Information
language	English
publishDate	2004
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-3173642004-01-23 PyEvolve: a toolkit for statistical modelling of molecular evolution Butterfield, Andrew Vedagiri, Vivek Lang, Edward Lawrence, Cath Wakefield, Matthew J Isaev, Alexander Huttley, Gavin A BMC Bioinformatics Software BACKGROUND: Examining the distribution of variation has proven an extremely profitable technique in the effort to identify sequences of biological significance. Most approaches in the field, however, evaluate only the conserved portions of sequences – ignoring the biological significance of sequence differences. A suite of sophisticated likelihood based statistical models from the field of molecular evolution provides the basis for extracting the information from the full distribution of sequence variation. The number of different problems to which phylogeny-based maximum likelihood calculations can be applied is extensive. Available software packages that can perform likelihood calculations suffer from a lack of flexibility and scalability, or employ error-prone approaches to model parameterisation. RESULTS: Here we describe the implementation of PyEvolve, a toolkit for the application of existing, and development of new, statistical methods for molecular evolution. We present the object architecture and design schema of PyEvolve, which includes an adaptable multi-level parallelisation schema. The approach for defining new methods is illustrated by implementing a novel dinucleotide model of substitution that includes a parameter for mutation of methylated CpG's, which required 8 lines of standard Python code to define. Benchmarking was performed using either a dinucleotide or codon substitution model applied to an alignment of BRCA1 sequences from 20 mammals, or a 10 species subset. Up to five-fold parallel performance gains over serial were recorded. Compared to leading alternative software, PyEvolve exhibited significantly better real world performance for parameter rich models with a large data set, reducing the time required for optimisation from ~10 days to ~6 hours. CONCLUSION: PyEvolve provides flexible functionality that can be used either for statistical modelling of molecular evolution, or the development of new methods in the field. The toolkit can be used interactively or by writing and executing scripts. The toolkit uses efficient processes for specifying the parameterisation of statistical models, and implements numerous optimisations that make highly parameter rich likelihood functions solvable within hours on multi-cpu hardware. PyEvolve can be readily adapted in response to changing computational demands and hardware configurations to maximise performance. PyEvolve is released under the GPL and can be downloaded from . BioMed Central 2004-01-05 /pmc/articles/PMC317364/ /pubmed/14706121 http://dx.doi.org/10.1186/1471-2105-5-1 Text en Copyright © 2004 Butterfield et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle	Software Butterfield, Andrew Vedagiri, Vivek Lang, Edward Lawrence, Cath Wakefield, Matthew J Isaev, Alexander Huttley, Gavin A PyEvolve: a toolkit for statistical modelling of molecular evolution
title	PyEvolve: a toolkit for statistical modelling of molecular evolution
title_full	PyEvolve: a toolkit for statistical modelling of molecular evolution
title_fullStr	PyEvolve: a toolkit for statistical modelling of molecular evolution
title_full_unstemmed	PyEvolve: a toolkit for statistical modelling of molecular evolution
title_short	PyEvolve: a toolkit for statistical modelling of molecular evolution
title_sort	pyevolve: a toolkit for statistical modelling of molecular evolution
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC317364/ https://www.ncbi.nlm.nih.gov/pubmed/14706121 http://dx.doi.org/10.1186/1471-2105-5-1
work_keys_str_mv	AT butterfieldandrew pyevolveatoolkitforstatisticalmodellingofmolecularevolution AT vedagirivivek pyevolveatoolkitforstatisticalmodellingofmolecularevolution AT langedward pyevolveatoolkitforstatisticalmodellingofmolecularevolution AT lawrencecath pyevolveatoolkitforstatisticalmodellingofmolecularevolution AT wakefieldmatthewj pyevolveatoolkitforstatisticalmodellingofmolecularevolution AT isaevalexander pyevolveatoolkitforstatisticalmodellingofmolecularevolution AT huttleygavina pyevolveatoolkitforstatisticalmodellingofmolecularevolution

PyEvolve: a toolkit for statistical modelling of molecular evolution

Ejemplares similares