Cargando…

Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families

Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events such as domain insertion, deletion, or duplication. The evol...

Descripción completa

Detalles Bibliográficos
Autores principales: Yadav, Akshay, Fernández-Baca, David, Cannon, Steven B
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7350399/
https://www.ncbi.nlm.nih.gov/pubmed/32694909
http://dx.doi.org/10.1177/1176934320939943
_version_ 1783557260102860800
author Yadav, Akshay
Fernández-Baca, David
Cannon, Steven B
author_facet Yadav, Akshay
Fernández-Baca, David
Cannon, Steven B
author_sort Yadav, Akshay
collection PubMed
description Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events such as domain insertion, deletion, or duplication. The evolution of protein domains can be studied by tracking domain changes in a selected set of species with known phylogenetic relationships. Here, we conduct such an analysis by defining domains as “features” or “descriptors,” and considering the species (target + outgroup) as instances or data-points in a data matrix. We then look for features (domains) that are significantly different between the target species and the outgroup species. We study the domain changes in 2 large, distinct groups of plant species: legumes (Fabaceae) and grasses (Poaceae), with respect to selected outgroup species. We evaluate 4 types of domain feature matrices: domain content, domain duplication, domain abundance, and domain versatility. The 4 types of domain feature matrices attempt to capture different aspects of domain changes through which the protein sequences may evolve—that is, via gain or loss of domains, increase or decrease in the copy number of domains along the sequences, expansion or contraction of domains, or through changes in the number of adjacent domain partners. All the feature matrices were analyzed using feature selection techniques and statistical tests to select protein domains that have significant different feature values in legumes and grasses. We report the biological functions of the top selected domains from the analysis of all the feature matrices. In addition, we also perform domain-centric gene ontology (dcGO) enrichment analysis on all selected domains from all 4 feature matrices to study the gene ontology terms associated with the significantly evolving domains in legumes and grasses. Domain content analysis revealed a striking loss of protein domains from the Fanconi anemia (FA) pathway, the pathway responsible for the repair of interstrand DNA crosslinks. The abundance analysis of domains found in legumes revealed an increase in glutathione synthase enzyme, an antioxidant required from nitrogen fixation, and a decrease in xanthine oxidizing enzymes, a phenomenon confirmed by previous studies. In grasses, the abundance analysis showed increases in domains related to gene silencing which could be due to polyploidy or due to enhanced response to viral infection. We provide a docker container that can be used to perform this analysis workflow on any user-defined sets of species, available at https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project.
format Online
Article
Text
id pubmed-7350399
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-73503992020-07-20 Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families Yadav, Akshay Fernández-Baca, David Cannon, Steven B Evol Bioinform Online Original Research Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events such as domain insertion, deletion, or duplication. The evolution of protein domains can be studied by tracking domain changes in a selected set of species with known phylogenetic relationships. Here, we conduct such an analysis by defining domains as “features” or “descriptors,” and considering the species (target + outgroup) as instances or data-points in a data matrix. We then look for features (domains) that are significantly different between the target species and the outgroup species. We study the domain changes in 2 large, distinct groups of plant species: legumes (Fabaceae) and grasses (Poaceae), with respect to selected outgroup species. We evaluate 4 types of domain feature matrices: domain content, domain duplication, domain abundance, and domain versatility. The 4 types of domain feature matrices attempt to capture different aspects of domain changes through which the protein sequences may evolve—that is, via gain or loss of domains, increase or decrease in the copy number of domains along the sequences, expansion or contraction of domains, or through changes in the number of adjacent domain partners. All the feature matrices were analyzed using feature selection techniques and statistical tests to select protein domains that have significant different feature values in legumes and grasses. We report the biological functions of the top selected domains from the analysis of all the feature matrices. In addition, we also perform domain-centric gene ontology (dcGO) enrichment analysis on all selected domains from all 4 feature matrices to study the gene ontology terms associated with the significantly evolving domains in legumes and grasses. Domain content analysis revealed a striking loss of protein domains from the Fanconi anemia (FA) pathway, the pathway responsible for the repair of interstrand DNA crosslinks. The abundance analysis of domains found in legumes revealed an increase in glutathione synthase enzyme, an antioxidant required from nitrogen fixation, and a decrease in xanthine oxidizing enzymes, a phenomenon confirmed by previous studies. In grasses, the abundance analysis showed increases in domains related to gene silencing which could be due to polyploidy or due to enhanced response to viral infection. We provide a docker container that can be used to perform this analysis workflow on any user-defined sets of species, available at https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project. SAGE Publications 2020-07-09 /pmc/articles/PMC7350399/ /pubmed/32694909 http://dx.doi.org/10.1177/1176934320939943 Text en © The Author(s) 2020 https://creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Original Research
Yadav, Akshay
Fernández-Baca, David
Cannon, Steven B
Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families
title Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families
title_full Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families
title_fullStr Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families
title_full_unstemmed Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families
title_short Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families
title_sort family-specific gains and losses of protein domains in the legume and grass plant families
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7350399/
https://www.ncbi.nlm.nih.gov/pubmed/32694909
http://dx.doi.org/10.1177/1176934320939943
work_keys_str_mv AT yadavakshay familyspecificgainsandlossesofproteindomainsinthelegumeandgrassplantfamilies
AT fernandezbacadavid familyspecificgainsandlossesofproteindomainsinthelegumeandgrassplantfamilies
AT cannonstevenb familyspecificgainsandlossesofproteindomainsinthelegumeandgrassplantfamilies