Cargando…
ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models
Evolutionary reconstruction algorithms produce models of the evolutionary history of proteins or species. Such algorithms are highly sensitive to their inputs: the sequences used and their alignments. Here, we asked whether the variance introduced by selecting different input sequences could be used...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
eLife Sciences Publications, Ltd
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6797483/ https://www.ncbi.nlm.nih.gov/pubmed/31621582 http://dx.doi.org/10.7554/eLife.47676 |
_version_ | 1783459839393923072 |
---|---|
author | Sloutsky, Roman Naegle, Kristen M |
author_facet | Sloutsky, Roman Naegle, Kristen M |
author_sort | Sloutsky, Roman |
collection | PubMed |
description | Evolutionary reconstruction algorithms produce models of the evolutionary history of proteins or species. Such algorithms are highly sensitive to their inputs: the sequences used and their alignments. Here, we asked whether the variance introduced by selecting different input sequences could be used to better identify accurate evolutionary models. We subsampled from available ortholog sequences and measured the distribution of observed relationships between paralogs produced across hundreds of models inferred from the subsamples. We observed two important phenomena. First, the reproducibility of an all-sequence, single-alignment reconstruction, measured by comparing topologies inferred from 90% subsamples, directly correlates with the accuracy of that single-alignment reconstruction, producing a measurable value for something that has been traditionally unknowable. Second, topologies that are most consistent with the observations made in the ensemble are more accurate and we present a meta algorithm that exploits this property to improve model accuracy. |
format | Online Article Text |
id | pubmed-6797483 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | eLife Sciences Publications, Ltd |
record_format | MEDLINE/PubMed |
spelling | pubmed-67974832019-10-21 ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models Sloutsky, Roman Naegle, Kristen M eLife Computational and Systems Biology Evolutionary reconstruction algorithms produce models of the evolutionary history of proteins or species. Such algorithms are highly sensitive to their inputs: the sequences used and their alignments. Here, we asked whether the variance introduced by selecting different input sequences could be used to better identify accurate evolutionary models. We subsampled from available ortholog sequences and measured the distribution of observed relationships between paralogs produced across hundreds of models inferred from the subsamples. We observed two important phenomena. First, the reproducibility of an all-sequence, single-alignment reconstruction, measured by comparing topologies inferred from 90% subsamples, directly correlates with the accuracy of that single-alignment reconstruction, producing a measurable value for something that has been traditionally unknowable. Second, topologies that are most consistent with the observations made in the ensemble are more accurate and we present a meta algorithm that exploits this property to improve model accuracy. eLife Sciences Publications, Ltd 2019-10-17 /pmc/articles/PMC6797483/ /pubmed/31621582 http://dx.doi.org/10.7554/eLife.47676 Text en © 2019, Sloutsky and Naegle http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use and redistribution provided that the original author and source are credited. |
spellingShingle | Computational and Systems Biology Sloutsky, Roman Naegle, Kristen M ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models |
title | ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models |
title_full | ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models |
title_fullStr | ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models |
title_full_unstemmed | ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models |
title_short | ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models |
title_sort | aspen, a methodology for reconstructing protein evolution with improved accuracy using ensemble models |
topic | Computational and Systems Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6797483/ https://www.ncbi.nlm.nih.gov/pubmed/31621582 http://dx.doi.org/10.7554/eLife.47676 |
work_keys_str_mv | AT sloutskyroman aspenamethodologyforreconstructingproteinevolutionwithimprovedaccuracyusingensemblemodels AT naeglekristenm aspenamethodologyforreconstructingproteinevolutionwithimprovedaccuracyusingensemblemodels |