Cargando…

Biological Sequence Simulation for Testing Complex Evolutionary Hypotheses: indel-Seq-Gen Version 2.0

Sequence simulation is an important tool in validating biological hypotheses as well as testing various bioinformatics and molecular evolutionary methods. Hypothesis testing relies on the representational ability of the sequence simulation method. Simple hypotheses are testable through simulation of...

Descripción completa

Detalles Bibliográficos
Autores principales: Strope, Cory L., Abel, Kevin, Scott, Stephen D., Moriyama, Etsuko N.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2760465/
https://www.ncbi.nlm.nih.gov/pubmed/19651852
http://dx.doi.org/10.1093/molbev/msp174
_version_ 1782172742373605376
author Strope, Cory L.
Abel, Kevin
Scott, Stephen D.
Moriyama, Etsuko N.
author_facet Strope, Cory L.
Abel, Kevin
Scott, Stephen D.
Moriyama, Etsuko N.
author_sort Strope, Cory L.
collection PubMed
description Sequence simulation is an important tool in validating biological hypotheses as well as testing various bioinformatics and molecular evolutionary methods. Hypothesis testing relies on the representational ability of the sequence simulation method. Simple hypotheses are testable through simulation of random, homogeneously evolving sequence sets. However, testing complex hypotheses, for example, local similarities, requires simulation of sequence evolution under heterogeneous models. To this end, we previously introduced indel-Seq-Gen version 1.0 (iSGv1.0; indel, insertion/deletion). iSGv1.0 allowed heterogeneous protein evolution and motif conservation as well as insertion and deletion constraints in subsequences. Despite these advances, for complex hypothesis testing, neither iSGv1.0 nor other currently available sequence simulation methods is sufficient. indel-Seq-Gen version 2.0 (iSGv2.0) aims at simulating evolution of highly divergent DNA sequences and protein superfamilies. iSGv2.0 improves upon iSGv1.0 through the addition of lineage-specific evolution, motif conservation using PROSITE-like regular expressions, indel tracking, subsequence-length constraints, as well as coding and noncoding DNA evolution. Furthermore, we formalize the sequence representation used for iSGv2.0 and uncover a flaw in the modeling of indels used in current state of the art methods, which biases simulation results for hypotheses involving indels. We fix this flaw in iSGv2.0 by using a novel discrete stepping procedure. Finally, we present an example simulation of the calycin-superfamily sequences and compare the performance of iSGv2.0 with iSGv1.0 and random model of sequence evolution.
format Text
id pubmed-2760465
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-27604652009-10-13 Biological Sequence Simulation for Testing Complex Evolutionary Hypotheses: indel-Seq-Gen Version 2.0 Strope, Cory L. Abel, Kevin Scott, Stephen D. Moriyama, Etsuko N. Mol Biol Evol Research Articles Sequence simulation is an important tool in validating biological hypotheses as well as testing various bioinformatics and molecular evolutionary methods. Hypothesis testing relies on the representational ability of the sequence simulation method. Simple hypotheses are testable through simulation of random, homogeneously evolving sequence sets. However, testing complex hypotheses, for example, local similarities, requires simulation of sequence evolution under heterogeneous models. To this end, we previously introduced indel-Seq-Gen version 1.0 (iSGv1.0; indel, insertion/deletion). iSGv1.0 allowed heterogeneous protein evolution and motif conservation as well as insertion and deletion constraints in subsequences. Despite these advances, for complex hypothesis testing, neither iSGv1.0 nor other currently available sequence simulation methods is sufficient. indel-Seq-Gen version 2.0 (iSGv2.0) aims at simulating evolution of highly divergent DNA sequences and protein superfamilies. iSGv2.0 improves upon iSGv1.0 through the addition of lineage-specific evolution, motif conservation using PROSITE-like regular expressions, indel tracking, subsequence-length constraints, as well as coding and noncoding DNA evolution. Furthermore, we formalize the sequence representation used for iSGv2.0 and uncover a flaw in the modeling of indels used in current state of the art methods, which biases simulation results for hypotheses involving indels. We fix this flaw in iSGv2.0 by using a novel discrete stepping procedure. Finally, we present an example simulation of the calycin-superfamily sequences and compare the performance of iSGv2.0 with iSGv1.0 and random model of sequence evolution. Oxford University Press 2009-11 2009-08-03 /pmc/articles/PMC2760465/ /pubmed/19651852 http://dx.doi.org/10.1093/molbev/msp174 Text en © 2009 The Authors This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Articles
Strope, Cory L.
Abel, Kevin
Scott, Stephen D.
Moriyama, Etsuko N.
Biological Sequence Simulation for Testing Complex Evolutionary Hypotheses: indel-Seq-Gen Version 2.0
title Biological Sequence Simulation for Testing Complex Evolutionary Hypotheses: indel-Seq-Gen Version 2.0
title_full Biological Sequence Simulation for Testing Complex Evolutionary Hypotheses: indel-Seq-Gen Version 2.0
title_fullStr Biological Sequence Simulation for Testing Complex Evolutionary Hypotheses: indel-Seq-Gen Version 2.0
title_full_unstemmed Biological Sequence Simulation for Testing Complex Evolutionary Hypotheses: indel-Seq-Gen Version 2.0
title_short Biological Sequence Simulation for Testing Complex Evolutionary Hypotheses: indel-Seq-Gen Version 2.0
title_sort biological sequence simulation for testing complex evolutionary hypotheses: indel-seq-gen version 2.0
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2760465/
https://www.ncbi.nlm.nih.gov/pubmed/19651852
http://dx.doi.org/10.1093/molbev/msp174
work_keys_str_mv AT stropecoryl biologicalsequencesimulationfortestingcomplexevolutionaryhypothesesindelseqgenversion20
AT abelkevin biologicalsequencesimulationfortestingcomplexevolutionaryhypothesesindelseqgenversion20
AT scottstephend biologicalsequencesimulationfortestingcomplexevolutionaryhypothesesindelseqgenversion20
AT moriyamaetsukon biologicalsequencesimulationfortestingcomplexevolutionaryhypothesesindelseqgenversion20