Cargando…
Predictive Models of Genetic Redundancy in Arabidopsis thaliana
Genetic redundancy refers to a situation where an individual with a loss-of-function mutation in one gene (single mutant) does not show an apparent phenotype until one or more paralogs are also knocked out (double/higher-order mutant). Previous studies have identified some characteristics common amo...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8321531/ https://www.ncbi.nlm.nih.gov/pubmed/33871641 http://dx.doi.org/10.1093/molbev/msab111 |
_version_ | 1783730872590008320 |
---|---|
author | Cusack, Siobhan A Wang, Peipei Lotreck, Serena G Moore, Bethany M Meng, Fanrui Conner, Jeffrey K Krysan, Patrick J Lehti-Shiu, Melissa D Shiu, Shin-Han |
author_facet | Cusack, Siobhan A Wang, Peipei Lotreck, Serena G Moore, Bethany M Meng, Fanrui Conner, Jeffrey K Krysan, Patrick J Lehti-Shiu, Melissa D Shiu, Shin-Han |
author_sort | Cusack, Siobhan A |
collection | PubMed |
description | Genetic redundancy refers to a situation where an individual with a loss-of-function mutation in one gene (single mutant) does not show an apparent phenotype until one or more paralogs are also knocked out (double/higher-order mutant). Previous studies have identified some characteristics common among redundant gene pairs, but a predictive model of genetic redundancy incorporating a wide variety of features derived from accumulating omics and mutant phenotype data is yet to be established. In addition, the relative importance of these features for genetic redundancy remains largely unclear. Here, we establish machine learning models for predicting whether a gene pair is likely redundant or not in the model plant Arabidopsis thaliana based on six feature categories: functional annotations, evolutionary conservation including duplication patterns and mechanisms, epigenetic marks, protein properties including posttranslational modifications, gene expression, and gene network properties. The definition of redundancy, data transformations, feature subsets, and machine learning algorithms used significantly affected model performance based on holdout, testing phenotype data. Among the most important features in predicting gene pairs as redundant were having a paralog(s) from recent duplication events, annotation as a transcription factor, downregulation during stress conditions, and having similar expression patterns under stress conditions. We also explored the potential reasons underlying mispredictions and limitations of our studies. This genetic redundancy model sheds light on characteristics that may contribute to long-term maintenance of paralogs, and will ultimately allow for more targeted generation of functionally informative double mutants, advancing functional genomic studies. |
format | Online Article Text |
id | pubmed-8321531 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-83215312021-07-30 Predictive Models of Genetic Redundancy in Arabidopsis thaliana Cusack, Siobhan A Wang, Peipei Lotreck, Serena G Moore, Bethany M Meng, Fanrui Conner, Jeffrey K Krysan, Patrick J Lehti-Shiu, Melissa D Shiu, Shin-Han Mol Biol Evol Discoveries Genetic redundancy refers to a situation where an individual with a loss-of-function mutation in one gene (single mutant) does not show an apparent phenotype until one or more paralogs are also knocked out (double/higher-order mutant). Previous studies have identified some characteristics common among redundant gene pairs, but a predictive model of genetic redundancy incorporating a wide variety of features derived from accumulating omics and mutant phenotype data is yet to be established. In addition, the relative importance of these features for genetic redundancy remains largely unclear. Here, we establish machine learning models for predicting whether a gene pair is likely redundant or not in the model plant Arabidopsis thaliana based on six feature categories: functional annotations, evolutionary conservation including duplication patterns and mechanisms, epigenetic marks, protein properties including posttranslational modifications, gene expression, and gene network properties. The definition of redundancy, data transformations, feature subsets, and machine learning algorithms used significantly affected model performance based on holdout, testing phenotype data. Among the most important features in predicting gene pairs as redundant were having a paralog(s) from recent duplication events, annotation as a transcription factor, downregulation during stress conditions, and having similar expression patterns under stress conditions. We also explored the potential reasons underlying mispredictions and limitations of our studies. This genetic redundancy model sheds light on characteristics that may contribute to long-term maintenance of paralogs, and will ultimately allow for more targeted generation of functionally informative double mutants, advancing functional genomic studies. Oxford University Press 2021-04-19 /pmc/articles/PMC8321531/ /pubmed/33871641 http://dx.doi.org/10.1093/molbev/msab111 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Discoveries Cusack, Siobhan A Wang, Peipei Lotreck, Serena G Moore, Bethany M Meng, Fanrui Conner, Jeffrey K Krysan, Patrick J Lehti-Shiu, Melissa D Shiu, Shin-Han Predictive Models of Genetic Redundancy in Arabidopsis thaliana |
title | Predictive Models of Genetic Redundancy in Arabidopsis thaliana |
title_full | Predictive Models of Genetic Redundancy in Arabidopsis thaliana |
title_fullStr | Predictive Models of Genetic Redundancy in Arabidopsis thaliana |
title_full_unstemmed | Predictive Models of Genetic Redundancy in Arabidopsis thaliana |
title_short | Predictive Models of Genetic Redundancy in Arabidopsis thaliana |
title_sort | predictive models of genetic redundancy in arabidopsis thaliana |
topic | Discoveries |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8321531/ https://www.ncbi.nlm.nih.gov/pubmed/33871641 http://dx.doi.org/10.1093/molbev/msab111 |
work_keys_str_mv | AT cusacksiobhana predictivemodelsofgeneticredundancyinarabidopsisthaliana AT wangpeipei predictivemodelsofgeneticredundancyinarabidopsisthaliana AT lotreckserenag predictivemodelsofgeneticredundancyinarabidopsisthaliana AT moorebethanym predictivemodelsofgeneticredundancyinarabidopsisthaliana AT mengfanrui predictivemodelsofgeneticredundancyinarabidopsisthaliana AT connerjeffreyk predictivemodelsofgeneticredundancyinarabidopsisthaliana AT krysanpatrickj predictivemodelsofgeneticredundancyinarabidopsisthaliana AT lehtishiumelissad predictivemodelsofgeneticredundancyinarabidopsisthaliana AT shiushinhan predictivemodelsofgeneticredundancyinarabidopsisthaliana |