Cargando…

Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data

Learning about the roles that duplicate genes play in the origins of novel phenotypes requires an understanding of how their functions evolve. A previous method for achieving this goal, CDROM, employs gene expression distances as proxies for functional divergence and then classifies the evolutionary...

Descripción completa

Detalles Bibliográficos
Autores principales: DeGiorgio, Michael, Assis, Raquel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7947822/
https://www.ncbi.nlm.nih.gov/pubmed/33045078
http://dx.doi.org/10.1093/molbev/msaa267
_version_ 1783663308157485056
author DeGiorgio, Michael
Assis, Raquel
author_facet DeGiorgio, Michael
Assis, Raquel
author_sort DeGiorgio, Michael
collection PubMed
description Learning about the roles that duplicate genes play in the origins of novel phenotypes requires an understanding of how their functions evolve. A previous method for achieving this goal, CDROM, employs gene expression distances as proxies for functional divergence and then classifies the evolutionary mechanisms retaining duplicate genes from comparisons of these distances in a decision tree framework. However, CDROM does not account for stochastic shifts in gene expression or leverage advances in contemporary statistical learning for performing classification, nor is it capable of predicting the parameters driving duplicate gene evolution. Thus, here we develop CLOUD, a multi-layer neural network built on a model of gene expression evolution that can both classify duplicate gene retention mechanisms and predict their underlying evolutionary parameters. We show that not only is the CLOUD classifier substantially more powerful and accurate than CDROM, but that it also yields accurate parameter predictions, enabling a better understanding of the specific forces driving the evolution and long-term retention of duplicate genes. Further, application of the CLOUD classifier and predictor to empirical data from Drosophila recapitulates many previous findings about gene duplication in this lineage, showing that new functions often emerge rapidly and asymmetrically in younger duplicate gene copies, and that functional divergence is driven by strong natural selection. Hence, CLOUD represents a major advancement in classifying retention mechanisms and predicting evolutionary parameters of duplicate genes, thereby highlighting the utility of incorporating sophisticated statistical learning techniques to address long-standing questions about evolution after gene duplication.
format Online
Article
Text
id pubmed-7947822
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-79478222021-03-16 Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data DeGiorgio, Michael Assis, Raquel Mol Biol Evol Methods Learning about the roles that duplicate genes play in the origins of novel phenotypes requires an understanding of how their functions evolve. A previous method for achieving this goal, CDROM, employs gene expression distances as proxies for functional divergence and then classifies the evolutionary mechanisms retaining duplicate genes from comparisons of these distances in a decision tree framework. However, CDROM does not account for stochastic shifts in gene expression or leverage advances in contemporary statistical learning for performing classification, nor is it capable of predicting the parameters driving duplicate gene evolution. Thus, here we develop CLOUD, a multi-layer neural network built on a model of gene expression evolution that can both classify duplicate gene retention mechanisms and predict their underlying evolutionary parameters. We show that not only is the CLOUD classifier substantially more powerful and accurate than CDROM, but that it also yields accurate parameter predictions, enabling a better understanding of the specific forces driving the evolution and long-term retention of duplicate genes. Further, application of the CLOUD classifier and predictor to empirical data from Drosophila recapitulates many previous findings about gene duplication in this lineage, showing that new functions often emerge rapidly and asymmetrically in younger duplicate gene copies, and that functional divergence is driven by strong natural selection. Hence, CLOUD represents a major advancement in classifying retention mechanisms and predicting evolutionary parameters of duplicate genes, thereby highlighting the utility of incorporating sophisticated statistical learning techniques to address long-standing questions about evolution after gene duplication. Oxford University Press 2020-10-12 /pmc/articles/PMC7947822/ /pubmed/33045078 http://dx.doi.org/10.1093/molbev/msaa267 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods
DeGiorgio, Michael
Assis, Raquel
Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data
title Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data
title_full Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data
title_fullStr Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data
title_full_unstemmed Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data
title_short Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data
title_sort learning retention mechanisms and evolutionary parameters of duplicate genes from their expression data
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7947822/
https://www.ncbi.nlm.nih.gov/pubmed/33045078
http://dx.doi.org/10.1093/molbev/msaa267
work_keys_str_mv AT degiorgiomichael learningretentionmechanismsandevolutionaryparametersofduplicategenesfromtheirexpressiondata
AT assisraquel learningretentionmechanismsandevolutionaryparametersofduplicategenesfromtheirexpressiondata