Cargando…

Predictive Models of Genetic Redundancy in Arabidopsis thaliana

Genetic redundancy refers to a situation where an individual with a loss-of-function mutation in one gene (single mutant) does not show an apparent phenotype until one or more paralogs are also knocked out (double/higher-order mutant). Previous studies have identified some characteristics common amo...

Descripción completa

Detalles Bibliográficos
Autores principales: Cusack, Siobhan A, Wang, Peipei, Lotreck, Serena G, Moore, Bethany M, Meng, Fanrui, Conner, Jeffrey K, Krysan, Patrick J, Lehti-Shiu, Melissa D, Shiu, Shin-Han
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8321531/
https://www.ncbi.nlm.nih.gov/pubmed/33871641
http://dx.doi.org/10.1093/molbev/msab111
_version_ 1783730872590008320
author Cusack, Siobhan A
Wang, Peipei
Lotreck, Serena G
Moore, Bethany M
Meng, Fanrui
Conner, Jeffrey K
Krysan, Patrick J
Lehti-Shiu, Melissa D
Shiu, Shin-Han
author_facet Cusack, Siobhan A
Wang, Peipei
Lotreck, Serena G
Moore, Bethany M
Meng, Fanrui
Conner, Jeffrey K
Krysan, Patrick J
Lehti-Shiu, Melissa D
Shiu, Shin-Han
author_sort Cusack, Siobhan A
collection PubMed
description Genetic redundancy refers to a situation where an individual with a loss-of-function mutation in one gene (single mutant) does not show an apparent phenotype until one or more paralogs are also knocked out (double/higher-order mutant). Previous studies have identified some characteristics common among redundant gene pairs, but a predictive model of genetic redundancy incorporating a wide variety of features derived from accumulating omics and mutant phenotype data is yet to be established. In addition, the relative importance of these features for genetic redundancy remains largely unclear. Here, we establish machine learning models for predicting whether a gene pair is likely redundant or not in the model plant Arabidopsis thaliana based on six feature categories: functional annotations, evolutionary conservation including duplication patterns and mechanisms, epigenetic marks, protein properties including posttranslational modifications, gene expression, and gene network properties. The definition of redundancy, data transformations, feature subsets, and machine learning algorithms used significantly affected model performance based on holdout, testing phenotype data. Among the most important features in predicting gene pairs as redundant were having a paralog(s) from recent duplication events, annotation as a transcription factor, downregulation during stress conditions, and having similar expression patterns under stress conditions. We also explored the potential reasons underlying mispredictions and limitations of our studies. This genetic redundancy model sheds light on characteristics that may contribute to long-term maintenance of paralogs, and will ultimately allow for more targeted generation of functionally informative double mutants, advancing functional genomic studies.
format Online
Article
Text
id pubmed-8321531
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-83215312021-07-30 Predictive Models of Genetic Redundancy in Arabidopsis thaliana Cusack, Siobhan A Wang, Peipei Lotreck, Serena G Moore, Bethany M Meng, Fanrui Conner, Jeffrey K Krysan, Patrick J Lehti-Shiu, Melissa D Shiu, Shin-Han Mol Biol Evol Discoveries Genetic redundancy refers to a situation where an individual with a loss-of-function mutation in one gene (single mutant) does not show an apparent phenotype until one or more paralogs are also knocked out (double/higher-order mutant). Previous studies have identified some characteristics common among redundant gene pairs, but a predictive model of genetic redundancy incorporating a wide variety of features derived from accumulating omics and mutant phenotype data is yet to be established. In addition, the relative importance of these features for genetic redundancy remains largely unclear. Here, we establish machine learning models for predicting whether a gene pair is likely redundant or not in the model plant Arabidopsis thaliana based on six feature categories: functional annotations, evolutionary conservation including duplication patterns and mechanisms, epigenetic marks, protein properties including posttranslational modifications, gene expression, and gene network properties. The definition of redundancy, data transformations, feature subsets, and machine learning algorithms used significantly affected model performance based on holdout, testing phenotype data. Among the most important features in predicting gene pairs as redundant were having a paralog(s) from recent duplication events, annotation as a transcription factor, downregulation during stress conditions, and having similar expression patterns under stress conditions. We also explored the potential reasons underlying mispredictions and limitations of our studies. This genetic redundancy model sheds light on characteristics that may contribute to long-term maintenance of paralogs, and will ultimately allow for more targeted generation of functionally informative double mutants, advancing functional genomic studies. Oxford University Press 2021-04-19 /pmc/articles/PMC8321531/ /pubmed/33871641 http://dx.doi.org/10.1093/molbev/msab111 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Discoveries
Cusack, Siobhan A
Wang, Peipei
Lotreck, Serena G
Moore, Bethany M
Meng, Fanrui
Conner, Jeffrey K
Krysan, Patrick J
Lehti-Shiu, Melissa D
Shiu, Shin-Han
Predictive Models of Genetic Redundancy in Arabidopsis thaliana
title Predictive Models of Genetic Redundancy in Arabidopsis thaliana
title_full Predictive Models of Genetic Redundancy in Arabidopsis thaliana
title_fullStr Predictive Models of Genetic Redundancy in Arabidopsis thaliana
title_full_unstemmed Predictive Models of Genetic Redundancy in Arabidopsis thaliana
title_short Predictive Models of Genetic Redundancy in Arabidopsis thaliana
title_sort predictive models of genetic redundancy in arabidopsis thaliana
topic Discoveries
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8321531/
https://www.ncbi.nlm.nih.gov/pubmed/33871641
http://dx.doi.org/10.1093/molbev/msab111
work_keys_str_mv AT cusacksiobhana predictivemodelsofgeneticredundancyinarabidopsisthaliana
AT wangpeipei predictivemodelsofgeneticredundancyinarabidopsisthaliana
AT lotreckserenag predictivemodelsofgeneticredundancyinarabidopsisthaliana
AT moorebethanym predictivemodelsofgeneticredundancyinarabidopsisthaliana
AT mengfanrui predictivemodelsofgeneticredundancyinarabidopsisthaliana
AT connerjeffreyk predictivemodelsofgeneticredundancyinarabidopsisthaliana
AT krysanpatrickj predictivemodelsofgeneticredundancyinarabidopsisthaliana
AT lehtishiumelissad predictivemodelsofgeneticredundancyinarabidopsisthaliana
AT shiushinhan predictivemodelsofgeneticredundancyinarabidopsisthaliana