Cargando…
Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data
Learning about the roles that duplicate genes play in the origins of novel phenotypes requires an understanding of how their functions evolve. A previous method for achieving this goal, CDROM, employs gene expression distances as proxies for functional divergence and then classifies the evolutionary...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7947822/ https://www.ncbi.nlm.nih.gov/pubmed/33045078 http://dx.doi.org/10.1093/molbev/msaa267 |
_version_ | 1783663308157485056 |
---|---|
author | DeGiorgio, Michael Assis, Raquel |
author_facet | DeGiorgio, Michael Assis, Raquel |
author_sort | DeGiorgio, Michael |
collection | PubMed |
description | Learning about the roles that duplicate genes play in the origins of novel phenotypes requires an understanding of how their functions evolve. A previous method for achieving this goal, CDROM, employs gene expression distances as proxies for functional divergence and then classifies the evolutionary mechanisms retaining duplicate genes from comparisons of these distances in a decision tree framework. However, CDROM does not account for stochastic shifts in gene expression or leverage advances in contemporary statistical learning for performing classification, nor is it capable of predicting the parameters driving duplicate gene evolution. Thus, here we develop CLOUD, a multi-layer neural network built on a model of gene expression evolution that can both classify duplicate gene retention mechanisms and predict their underlying evolutionary parameters. We show that not only is the CLOUD classifier substantially more powerful and accurate than CDROM, but that it also yields accurate parameter predictions, enabling a better understanding of the specific forces driving the evolution and long-term retention of duplicate genes. Further, application of the CLOUD classifier and predictor to empirical data from Drosophila recapitulates many previous findings about gene duplication in this lineage, showing that new functions often emerge rapidly and asymmetrically in younger duplicate gene copies, and that functional divergence is driven by strong natural selection. Hence, CLOUD represents a major advancement in classifying retention mechanisms and predicting evolutionary parameters of duplicate genes, thereby highlighting the utility of incorporating sophisticated statistical learning techniques to address long-standing questions about evolution after gene duplication. |
format | Online Article Text |
id | pubmed-7947822 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-79478222021-03-16 Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data DeGiorgio, Michael Assis, Raquel Mol Biol Evol Methods Learning about the roles that duplicate genes play in the origins of novel phenotypes requires an understanding of how their functions evolve. A previous method for achieving this goal, CDROM, employs gene expression distances as proxies for functional divergence and then classifies the evolutionary mechanisms retaining duplicate genes from comparisons of these distances in a decision tree framework. However, CDROM does not account for stochastic shifts in gene expression or leverage advances in contemporary statistical learning for performing classification, nor is it capable of predicting the parameters driving duplicate gene evolution. Thus, here we develop CLOUD, a multi-layer neural network built on a model of gene expression evolution that can both classify duplicate gene retention mechanisms and predict their underlying evolutionary parameters. We show that not only is the CLOUD classifier substantially more powerful and accurate than CDROM, but that it also yields accurate parameter predictions, enabling a better understanding of the specific forces driving the evolution and long-term retention of duplicate genes. Further, application of the CLOUD classifier and predictor to empirical data from Drosophila recapitulates many previous findings about gene duplication in this lineage, showing that new functions often emerge rapidly and asymmetrically in younger duplicate gene copies, and that functional divergence is driven by strong natural selection. Hence, CLOUD represents a major advancement in classifying retention mechanisms and predicting evolutionary parameters of duplicate genes, thereby highlighting the utility of incorporating sophisticated statistical learning techniques to address long-standing questions about evolution after gene duplication. Oxford University Press 2020-10-12 /pmc/articles/PMC7947822/ /pubmed/33045078 http://dx.doi.org/10.1093/molbev/msaa267 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods DeGiorgio, Michael Assis, Raquel Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data |
title | Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data |
title_full | Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data |
title_fullStr | Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data |
title_full_unstemmed | Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data |
title_short | Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data |
title_sort | learning retention mechanisms and evolutionary parameters of duplicate genes from their expression data |
topic | Methods |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7947822/ https://www.ncbi.nlm.nih.gov/pubmed/33045078 http://dx.doi.org/10.1093/molbev/msaa267 |
work_keys_str_mv | AT degiorgiomichael learningretentionmechanismsandevolutionaryparametersofduplicategenesfromtheirexpressiondata AT assisraquel learningretentionmechanismsandevolutionaryparametersofduplicategenesfromtheirexpressiondata |