Cargando…

SIMPROT: Using an empirically determined indel distribution in simulations of protein evolution

BACKGROUND: General protein evolution models help determine the baseline expectations for the evolution of sequences, and they have been extensively useful in sequence analysis and for the computer simulation of artificial sequence data sets. RESULTS: We have developed a new method of simulating pro...

Descripción completa

Detalles Bibliográficos
Autores principales: Pang, Andy, Smith, Andrew D, Nuin, Paulo AS, Tillier, Elisabeth RM
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1261159/
https://www.ncbi.nlm.nih.gov/pubmed/16188037
http://dx.doi.org/10.1186/1471-2105-6-236
_version_ 1782125866307813376
author Pang, Andy
Smith, Andrew D
Nuin, Paulo AS
Tillier, Elisabeth RM
author_facet Pang, Andy
Smith, Andrew D
Nuin, Paulo AS
Tillier, Elisabeth RM
author_sort Pang, Andy
collection PubMed
description BACKGROUND: General protein evolution models help determine the baseline expectations for the evolution of sequences, and they have been extensively useful in sequence analysis and for the computer simulation of artificial sequence data sets. RESULTS: We have developed a new method of simulating protein sequence evolution, including insertion and deletion (indel) events in addition to amino-acid substitutions. The simulation generates both the simulated sequence family and a true sequence alignment that captures the evolutionary relationships between amino acids from different sequences. Our statistical model for indel evolution is based on the empirical indel distribution determined by Qian and Goldstein. We have parameterized this distribution so that it applies to sequences diverged by varying evolutionary times and generalized it to provide flexibility in simulation conditions. Our method uses a Monte-Carlo simulation strategy, and has been implemented in a C++ program named Simprot. CONCLUSION: Simprot will be useful for testing methods of analysis of protein sequence families particularly alignment methods, phylogenetic tree building, detection of recombination and horizontal gene transfer, and homology detection, where knowing the true course of sequence evolution is essential.
format Text
id pubmed-1261159
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-12611592005-10-27 SIMPROT: Using an empirically determined indel distribution in simulations of protein evolution Pang, Andy Smith, Andrew D Nuin, Paulo AS Tillier, Elisabeth RM BMC Bioinformatics Software BACKGROUND: General protein evolution models help determine the baseline expectations for the evolution of sequences, and they have been extensively useful in sequence analysis and for the computer simulation of artificial sequence data sets. RESULTS: We have developed a new method of simulating protein sequence evolution, including insertion and deletion (indel) events in addition to amino-acid substitutions. The simulation generates both the simulated sequence family and a true sequence alignment that captures the evolutionary relationships between amino acids from different sequences. Our statistical model for indel evolution is based on the empirical indel distribution determined by Qian and Goldstein. We have parameterized this distribution so that it applies to sequences diverged by varying evolutionary times and generalized it to provide flexibility in simulation conditions. Our method uses a Monte-Carlo simulation strategy, and has been implemented in a C++ program named Simprot. CONCLUSION: Simprot will be useful for testing methods of analysis of protein sequence families particularly alignment methods, phylogenetic tree building, detection of recombination and horizontal gene transfer, and homology detection, where knowing the true course of sequence evolution is essential. BioMed Central 2005-09-27 /pmc/articles/PMC1261159/ /pubmed/16188037 http://dx.doi.org/10.1186/1471-2105-6-236 Text en Copyright © 2005 Pang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Pang, Andy
Smith, Andrew D
Nuin, Paulo AS
Tillier, Elisabeth RM
SIMPROT: Using an empirically determined indel distribution in simulations of protein evolution
title SIMPROT: Using an empirically determined indel distribution in simulations of protein evolution
title_full SIMPROT: Using an empirically determined indel distribution in simulations of protein evolution
title_fullStr SIMPROT: Using an empirically determined indel distribution in simulations of protein evolution
title_full_unstemmed SIMPROT: Using an empirically determined indel distribution in simulations of protein evolution
title_short SIMPROT: Using an empirically determined indel distribution in simulations of protein evolution
title_sort simprot: using an empirically determined indel distribution in simulations of protein evolution
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1261159/
https://www.ncbi.nlm.nih.gov/pubmed/16188037
http://dx.doi.org/10.1186/1471-2105-6-236
work_keys_str_mv AT pangandy simprotusinganempiricallydeterminedindeldistributioninsimulationsofproteinevolution
AT smithandrewd simprotusinganempiricallydeterminedindeldistributioninsimulationsofproteinevolution
AT nuinpauloas simprotusinganempiricallydeterminedindeldistributioninsimulationsofproteinevolution
AT tillierelisabethrm simprotusinganempiricallydeterminedindeldistributioninsimulationsofproteinevolution