Cargando…
Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins
Connecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. In order to go beyond the existing database annotation, it is fundamental to understand the mechanisms underlying functional inheritance and di...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9009777/ https://www.ncbi.nlm.nih.gov/pubmed/35377869 http://dx.doi.org/10.1371/journal.pcbi.1010016 |
_version_ | 1784687337186263040 |
---|---|
author | Pascarelli, Stefano Laurino, Paola |
author_facet | Pascarelli, Stefano Laurino, Paola |
author_sort | Pascarelli, Stefano |
collection | PubMed |
description | Connecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. In order to go beyond the existing database annotation, it is fundamental to understand the mechanisms underlying functional inheritance and divergence. If the homology relationship between proteins is known, can we determine whether the function diverged? In this work, we analyze different possibilities of protein sequence evolution after gene duplication and identify “inter-paralog inversions”, i.e., sites where the relationship between the ancestry and the functional signal is decoupled. The amino acids in these sites are masked from being recognized by other prediction tools. Still, they play a role in functional divergence and could indicate a shift in protein function. We develop a method to specifically recognize inter-paralog amino acid inversions in a phylogeny and test it on real and simulated datasets. In a dataset built from the Epidermal Growth Factor Receptor (EGFR) sequences found in 88 fish species, we identify 19 amino acid sites that went through inversion after gene duplication, mostly located at the ligand-binding extracellular domain. Our work uncovers an outcome of protein duplications with direct implications in protein functional annotation and sequence evolution. The developed method is optimized to work with large protein datasets and can be readily included in a targeted protein analysis pipeline. |
format | Online Article Text |
id | pubmed-9009777 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-90097772022-04-15 Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins Pascarelli, Stefano Laurino, Paola PLoS Comput Biol Research Article Connecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. In order to go beyond the existing database annotation, it is fundamental to understand the mechanisms underlying functional inheritance and divergence. If the homology relationship between proteins is known, can we determine whether the function diverged? In this work, we analyze different possibilities of protein sequence evolution after gene duplication and identify “inter-paralog inversions”, i.e., sites where the relationship between the ancestry and the functional signal is decoupled. The amino acids in these sites are masked from being recognized by other prediction tools. Still, they play a role in functional divergence and could indicate a shift in protein function. We develop a method to specifically recognize inter-paralog amino acid inversions in a phylogeny and test it on real and simulated datasets. In a dataset built from the Epidermal Growth Factor Receptor (EGFR) sequences found in 88 fish species, we identify 19 amino acid sites that went through inversion after gene duplication, mostly located at the ligand-binding extracellular domain. Our work uncovers an outcome of protein duplications with direct implications in protein functional annotation and sequence evolution. The developed method is optimized to work with large protein datasets and can be readily included in a targeted protein analysis pipeline. Public Library of Science 2022-04-04 /pmc/articles/PMC9009777/ /pubmed/35377869 http://dx.doi.org/10.1371/journal.pcbi.1010016 Text en © 2022 Pascarelli, Laurino https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Pascarelli, Stefano Laurino, Paola Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins |
title | Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins |
title_full | Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins |
title_fullStr | Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins |
title_full_unstemmed | Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins |
title_short | Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins |
title_sort | inter-paralog amino acid inversion events in large phylogenies of duplicated proteins |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9009777/ https://www.ncbi.nlm.nih.gov/pubmed/35377869 http://dx.doi.org/10.1371/journal.pcbi.1010016 |
work_keys_str_mv | AT pascarellistefano interparalogaminoacidinversioneventsinlargephylogeniesofduplicatedproteins AT laurinopaola interparalogaminoacidinversioneventsinlargephylogeniesofduplicatedproteins |