Cargando…

Simulation data for the estimation of numerical constants for approximating pairwise evolutionary distances between amino acid sequences

Estimating the number of substitution events per site that have occurred during the evolution of a pair of amino acid sequences is a common task in phylogenetics and comparative genomics that often requires quite slow maximum-likelihood procedures when taking into account explicit evolutionary model...

Descripción completa

Detalles Bibliográficos
Autores principales: Bigot, Thomas, Guglielmini, Julien, Criscuolo, Alexis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6699465/
https://www.ncbi.nlm.nih.gov/pubmed/31440543
http://dx.doi.org/10.1016/j.dib.2019.104212
_version_ 1783444734337875968
author Bigot, Thomas
Guglielmini, Julien
Criscuolo, Alexis
author_facet Bigot, Thomas
Guglielmini, Julien
Criscuolo, Alexis
author_sort Bigot, Thomas
collection PubMed
description Estimating the number of substitution events per site that have occurred during the evolution of a pair of amino acid sequences is a common task in phylogenetics and comparative genomics that often requires quite slow maximum-likelihood procedures when taking into account explicit evolutionary models. Data presented in this article are large sets of numbers of substitution events and associated numbers of observed differences between pairs of aligned amino acid sequences that have been generated through a simulation procedure of sequence evolution under a broad range of evolutionary models. These data are available at https://zenodo.org/record/2653704 (doi:10.5281/zenodo.2653704). They are accompanied in this paper by figures showing the strong relationship between the corresponding evolutionary and uncorrected distances, as well as estimated numerical constants that determine non-linear functions that fit the simulated data. These numerical constants can be useful to quickly estimate pairwise evolutionary distances directly from uncorrected distances between aligned amino acid sequences.
format Online
Article
Text
id pubmed-6699465
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-66994652019-08-22 Simulation data for the estimation of numerical constants for approximating pairwise evolutionary distances between amino acid sequences Bigot, Thomas Guglielmini, Julien Criscuolo, Alexis Data Brief Agricultural and Biological Science Estimating the number of substitution events per site that have occurred during the evolution of a pair of amino acid sequences is a common task in phylogenetics and comparative genomics that often requires quite slow maximum-likelihood procedures when taking into account explicit evolutionary models. Data presented in this article are large sets of numbers of substitution events and associated numbers of observed differences between pairs of aligned amino acid sequences that have been generated through a simulation procedure of sequence evolution under a broad range of evolutionary models. These data are available at https://zenodo.org/record/2653704 (doi:10.5281/zenodo.2653704). They are accompanied in this paper by figures showing the strong relationship between the corresponding evolutionary and uncorrected distances, as well as estimated numerical constants that determine non-linear functions that fit the simulated data. These numerical constants can be useful to quickly estimate pairwise evolutionary distances directly from uncorrected distances between aligned amino acid sequences. Elsevier 2019-07-08 /pmc/articles/PMC6699465/ /pubmed/31440543 http://dx.doi.org/10.1016/j.dib.2019.104212 Text en © 2019 The Author(s) http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Agricultural and Biological Science
Bigot, Thomas
Guglielmini, Julien
Criscuolo, Alexis
Simulation data for the estimation of numerical constants for approximating pairwise evolutionary distances between amino acid sequences
title Simulation data for the estimation of numerical constants for approximating pairwise evolutionary distances between amino acid sequences
title_full Simulation data for the estimation of numerical constants for approximating pairwise evolutionary distances between amino acid sequences
title_fullStr Simulation data for the estimation of numerical constants for approximating pairwise evolutionary distances between amino acid sequences
title_full_unstemmed Simulation data for the estimation of numerical constants for approximating pairwise evolutionary distances between amino acid sequences
title_short Simulation data for the estimation of numerical constants for approximating pairwise evolutionary distances between amino acid sequences
title_sort simulation data for the estimation of numerical constants for approximating pairwise evolutionary distances between amino acid sequences
topic Agricultural and Biological Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6699465/
https://www.ncbi.nlm.nih.gov/pubmed/31440543
http://dx.doi.org/10.1016/j.dib.2019.104212
work_keys_str_mv AT bigotthomas simulationdatafortheestimationofnumericalconstantsforapproximatingpairwiseevolutionarydistancesbetweenaminoacidsequences
AT guglielminijulien simulationdatafortheestimationofnumericalconstantsforapproximatingpairwiseevolutionarydistancesbetweenaminoacidsequences
AT criscuoloalexis simulationdatafortheestimationofnumericalconstantsforapproximatingpairwiseevolutionarydistancesbetweenaminoacidsequences