Cargando…

Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins

Connecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. In order to go beyond the existing database annotation, it is fundamental to understand the mechanisms underlying functional inheritance and di...

Descripción completa

Detalles Bibliográficos
Autores principales: Pascarelli, Stefano, Laurino, Paola
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9009777/
https://www.ncbi.nlm.nih.gov/pubmed/35377869
http://dx.doi.org/10.1371/journal.pcbi.1010016
_version_ 1784687337186263040
author Pascarelli, Stefano
Laurino, Paola
author_facet Pascarelli, Stefano
Laurino, Paola
author_sort Pascarelli, Stefano
collection PubMed
description Connecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. In order to go beyond the existing database annotation, it is fundamental to understand the mechanisms underlying functional inheritance and divergence. If the homology relationship between proteins is known, can we determine whether the function diverged? In this work, we analyze different possibilities of protein sequence evolution after gene duplication and identify “inter-paralog inversions”, i.e., sites where the relationship between the ancestry and the functional signal is decoupled. The amino acids in these sites are masked from being recognized by other prediction tools. Still, they play a role in functional divergence and could indicate a shift in protein function. We develop a method to specifically recognize inter-paralog amino acid inversions in a phylogeny and test it on real and simulated datasets. In a dataset built from the Epidermal Growth Factor Receptor (EGFR) sequences found in 88 fish species, we identify 19 amino acid sites that went through inversion after gene duplication, mostly located at the ligand-binding extracellular domain. Our work uncovers an outcome of protein duplications with direct implications in protein functional annotation and sequence evolution. The developed method is optimized to work with large protein datasets and can be readily included in a targeted protein analysis pipeline.
format Online
Article
Text
id pubmed-9009777
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-90097772022-04-15 Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins Pascarelli, Stefano Laurino, Paola PLoS Comput Biol Research Article Connecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. In order to go beyond the existing database annotation, it is fundamental to understand the mechanisms underlying functional inheritance and divergence. If the homology relationship between proteins is known, can we determine whether the function diverged? In this work, we analyze different possibilities of protein sequence evolution after gene duplication and identify “inter-paralog inversions”, i.e., sites where the relationship between the ancestry and the functional signal is decoupled. The amino acids in these sites are masked from being recognized by other prediction tools. Still, they play a role in functional divergence and could indicate a shift in protein function. We develop a method to specifically recognize inter-paralog amino acid inversions in a phylogeny and test it on real and simulated datasets. In a dataset built from the Epidermal Growth Factor Receptor (EGFR) sequences found in 88 fish species, we identify 19 amino acid sites that went through inversion after gene duplication, mostly located at the ligand-binding extracellular domain. Our work uncovers an outcome of protein duplications with direct implications in protein functional annotation and sequence evolution. The developed method is optimized to work with large protein datasets and can be readily included in a targeted protein analysis pipeline. Public Library of Science 2022-04-04 /pmc/articles/PMC9009777/ /pubmed/35377869 http://dx.doi.org/10.1371/journal.pcbi.1010016 Text en © 2022 Pascarelli, Laurino https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Pascarelli, Stefano
Laurino, Paola
Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins
title Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins
title_full Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins
title_fullStr Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins
title_full_unstemmed Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins
title_short Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins
title_sort inter-paralog amino acid inversion events in large phylogenies of duplicated proteins
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9009777/
https://www.ncbi.nlm.nih.gov/pubmed/35377869
http://dx.doi.org/10.1371/journal.pcbi.1010016
work_keys_str_mv AT pascarellistefano interparalogaminoacidinversioneventsinlargephylogeniesofduplicatedproteins
AT laurinopaola interparalogaminoacidinversioneventsinlargephylogeniesofduplicatedproteins