Cargando…

ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models

Evolutionary reconstruction algorithms produce models of the evolutionary history of proteins or species. Such algorithms are highly sensitive to their inputs: the sequences used and their alignments. Here, we asked whether the variance introduced by selecting different input sequences could be used...

Descripción completa

Detalles Bibliográficos
Autores principales: Sloutsky, Roman, Naegle, Kristen M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: eLife Sciences Publications, Ltd 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6797483/
https://www.ncbi.nlm.nih.gov/pubmed/31621582
http://dx.doi.org/10.7554/eLife.47676
_version_ 1783459839393923072
author Sloutsky, Roman
Naegle, Kristen M
author_facet Sloutsky, Roman
Naegle, Kristen M
author_sort Sloutsky, Roman
collection PubMed
description Evolutionary reconstruction algorithms produce models of the evolutionary history of proteins or species. Such algorithms are highly sensitive to their inputs: the sequences used and their alignments. Here, we asked whether the variance introduced by selecting different input sequences could be used to better identify accurate evolutionary models. We subsampled from available ortholog sequences and measured the distribution of observed relationships between paralogs produced across hundreds of models inferred from the subsamples. We observed two important phenomena. First, the reproducibility of an all-sequence, single-alignment reconstruction, measured by comparing topologies inferred from 90% subsamples, directly correlates with the accuracy of that single-alignment reconstruction, producing a measurable value for something that has been traditionally unknowable. Second, topologies that are most consistent with the observations made in the ensemble are more accurate and we present a meta algorithm that exploits this property to improve model accuracy.
format Online
Article
Text
id pubmed-6797483
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher eLife Sciences Publications, Ltd
record_format MEDLINE/PubMed
spelling pubmed-67974832019-10-21 ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models Sloutsky, Roman Naegle, Kristen M eLife Computational and Systems Biology Evolutionary reconstruction algorithms produce models of the evolutionary history of proteins or species. Such algorithms are highly sensitive to their inputs: the sequences used and their alignments. Here, we asked whether the variance introduced by selecting different input sequences could be used to better identify accurate evolutionary models. We subsampled from available ortholog sequences and measured the distribution of observed relationships between paralogs produced across hundreds of models inferred from the subsamples. We observed two important phenomena. First, the reproducibility of an all-sequence, single-alignment reconstruction, measured by comparing topologies inferred from 90% subsamples, directly correlates with the accuracy of that single-alignment reconstruction, producing a measurable value for something that has been traditionally unknowable. Second, topologies that are most consistent with the observations made in the ensemble are more accurate and we present a meta algorithm that exploits this property to improve model accuracy. eLife Sciences Publications, Ltd 2019-10-17 /pmc/articles/PMC6797483/ /pubmed/31621582 http://dx.doi.org/10.7554/eLife.47676 Text en © 2019, Sloutsky and Naegle http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use and redistribution provided that the original author and source are credited.
spellingShingle Computational and Systems Biology
Sloutsky, Roman
Naegle, Kristen M
ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models
title ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models
title_full ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models
title_fullStr ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models
title_full_unstemmed ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models
title_short ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models
title_sort aspen, a methodology for reconstructing protein evolution with improved accuracy using ensemble models
topic Computational and Systems Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6797483/
https://www.ncbi.nlm.nih.gov/pubmed/31621582
http://dx.doi.org/10.7554/eLife.47676
work_keys_str_mv AT sloutskyroman aspenamethodologyforreconstructingproteinevolutionwithimprovedaccuracyusingensemblemodels
AT naeglekristenm aspenamethodologyforreconstructingproteinevolutionwithimprovedaccuracyusingensemblemodels