Cargando…
Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families
Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events such as domain insertion, deletion, or duplication. The evol...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
SAGE Publications
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7350399/ https://www.ncbi.nlm.nih.gov/pubmed/32694909 http://dx.doi.org/10.1177/1176934320939943 |
_version_ | 1783557260102860800 |
---|---|
author | Yadav, Akshay Fernández-Baca, David Cannon, Steven B |
author_facet | Yadav, Akshay Fernández-Baca, David Cannon, Steven B |
author_sort | Yadav, Akshay |
collection | PubMed |
description | Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events such as domain insertion, deletion, or duplication. The evolution of protein domains can be studied by tracking domain changes in a selected set of species with known phylogenetic relationships. Here, we conduct such an analysis by defining domains as “features” or “descriptors,” and considering the species (target + outgroup) as instances or data-points in a data matrix. We then look for features (domains) that are significantly different between the target species and the outgroup species. We study the domain changes in 2 large, distinct groups of plant species: legumes (Fabaceae) and grasses (Poaceae), with respect to selected outgroup species. We evaluate 4 types of domain feature matrices: domain content, domain duplication, domain abundance, and domain versatility. The 4 types of domain feature matrices attempt to capture different aspects of domain changes through which the protein sequences may evolve—that is, via gain or loss of domains, increase or decrease in the copy number of domains along the sequences, expansion or contraction of domains, or through changes in the number of adjacent domain partners. All the feature matrices were analyzed using feature selection techniques and statistical tests to select protein domains that have significant different feature values in legumes and grasses. We report the biological functions of the top selected domains from the analysis of all the feature matrices. In addition, we also perform domain-centric gene ontology (dcGO) enrichment analysis on all selected domains from all 4 feature matrices to study the gene ontology terms associated with the significantly evolving domains in legumes and grasses. Domain content analysis revealed a striking loss of protein domains from the Fanconi anemia (FA) pathway, the pathway responsible for the repair of interstrand DNA crosslinks. The abundance analysis of domains found in legumes revealed an increase in glutathione synthase enzyme, an antioxidant required from nitrogen fixation, and a decrease in xanthine oxidizing enzymes, a phenomenon confirmed by previous studies. In grasses, the abundance analysis showed increases in domains related to gene silencing which could be due to polyploidy or due to enhanced response to viral infection. We provide a docker container that can be used to perform this analysis workflow on any user-defined sets of species, available at https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project. |
format | Online Article Text |
id | pubmed-7350399 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | SAGE Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-73503992020-07-20 Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families Yadav, Akshay Fernández-Baca, David Cannon, Steven B Evol Bioinform Online Original Research Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events such as domain insertion, deletion, or duplication. The evolution of protein domains can be studied by tracking domain changes in a selected set of species with known phylogenetic relationships. Here, we conduct such an analysis by defining domains as “features” or “descriptors,” and considering the species (target + outgroup) as instances or data-points in a data matrix. We then look for features (domains) that are significantly different between the target species and the outgroup species. We study the domain changes in 2 large, distinct groups of plant species: legumes (Fabaceae) and grasses (Poaceae), with respect to selected outgroup species. We evaluate 4 types of domain feature matrices: domain content, domain duplication, domain abundance, and domain versatility. The 4 types of domain feature matrices attempt to capture different aspects of domain changes through which the protein sequences may evolve—that is, via gain or loss of domains, increase or decrease in the copy number of domains along the sequences, expansion or contraction of domains, or through changes in the number of adjacent domain partners. All the feature matrices were analyzed using feature selection techniques and statistical tests to select protein domains that have significant different feature values in legumes and grasses. We report the biological functions of the top selected domains from the analysis of all the feature matrices. In addition, we also perform domain-centric gene ontology (dcGO) enrichment analysis on all selected domains from all 4 feature matrices to study the gene ontology terms associated with the significantly evolving domains in legumes and grasses. Domain content analysis revealed a striking loss of protein domains from the Fanconi anemia (FA) pathway, the pathway responsible for the repair of interstrand DNA crosslinks. The abundance analysis of domains found in legumes revealed an increase in glutathione synthase enzyme, an antioxidant required from nitrogen fixation, and a decrease in xanthine oxidizing enzymes, a phenomenon confirmed by previous studies. In grasses, the abundance analysis showed increases in domains related to gene silencing which could be due to polyploidy or due to enhanced response to viral infection. We provide a docker container that can be used to perform this analysis workflow on any user-defined sets of species, available at https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project. SAGE Publications 2020-07-09 /pmc/articles/PMC7350399/ /pubmed/32694909 http://dx.doi.org/10.1177/1176934320939943 Text en © The Author(s) 2020 https://creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). |
spellingShingle | Original Research Yadav, Akshay Fernández-Baca, David Cannon, Steven B Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families |
title | Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families |
title_full | Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families |
title_fullStr | Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families |
title_full_unstemmed | Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families |
title_short | Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families |
title_sort | family-specific gains and losses of protein domains in the legume and grass plant families |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7350399/ https://www.ncbi.nlm.nih.gov/pubmed/32694909 http://dx.doi.org/10.1177/1176934320939943 |
work_keys_str_mv | AT yadavakshay familyspecificgainsandlossesofproteindomainsinthelegumeandgrassplantfamilies AT fernandezbacadavid familyspecificgainsandlossesofproteindomainsinthelegumeandgrassplantfamilies AT cannonstevenb familyspecificgainsandlossesofproteindomainsinthelegumeandgrassplantfamilies |