Cargando…

The Recent De Novo Origin of Protein C-Termini

Protein-coding sequences can arise either from duplication and divergence of existing sequences, or de novo from noncoding DNA. Unfortunately, recently evolved de novo genes can be hard to distinguish from false positives, making their study difficult. Here, we study a more tractable version of the...

Descripción completa

Detalles Bibliográficos
Autores principales: Andreatta, Matthew E., Levine, Joshua A., Foy, Scott G., Guzman, Lynette D., Kosinski, Luke J., Cordes, Matthew H.J., Masel, Joanna
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4494051/
https://www.ncbi.nlm.nih.gov/pubmed/26002864
http://dx.doi.org/10.1093/gbe/evv098
_version_ 1782380021617262592
author Andreatta, Matthew E.
Levine, Joshua A.
Foy, Scott G.
Guzman, Lynette D.
Kosinski, Luke J.
Cordes, Matthew H.J.
Masel, Joanna
author_facet Andreatta, Matthew E.
Levine, Joshua A.
Foy, Scott G.
Guzman, Lynette D.
Kosinski, Luke J.
Cordes, Matthew H.J.
Masel, Joanna
author_sort Andreatta, Matthew E.
collection PubMed
description Protein-coding sequences can arise either from duplication and divergence of existing sequences, or de novo from noncoding DNA. Unfortunately, recently evolved de novo genes can be hard to distinguish from false positives, making their study difficult. Here, we study a more tractable version of the process of conversion of noncoding sequence into coding: the co-option of short segments of noncoding sequence into the C-termini of existing proteins via the loss of a stop codon. Because we study recent additions to potentially old genes, we are able to apply a variety of stringent quality filters to our annotations of what is a true protein-coding gene, discarding the putative proteins of unknown function that are typical of recent fully de novo genes. We identify 54 examples of C-terminal extensions in Saccharomyces and 28 in Drosophila, all of them recent enough to still be polymorphic. We find one putative gene fusion that turns out, on close inspection, to be the product of replicated assembly errors, further highlighting the issue of false positives in the study of rare events. Four of the Saccharomyces C-terminal extensions (to ADH1, ARP8, TPM2, and PIS1) that survived our quality filters are predicted to lead to significant modification of a protein domain structure.
format Online
Article
Text
id pubmed-4494051
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-44940512015-07-09 The Recent De Novo Origin of Protein C-Termini Andreatta, Matthew E. Levine, Joshua A. Foy, Scott G. Guzman, Lynette D. Kosinski, Luke J. Cordes, Matthew H.J. Masel, Joanna Genome Biol Evol Research Article Protein-coding sequences can arise either from duplication and divergence of existing sequences, or de novo from noncoding DNA. Unfortunately, recently evolved de novo genes can be hard to distinguish from false positives, making their study difficult. Here, we study a more tractable version of the process of conversion of noncoding sequence into coding: the co-option of short segments of noncoding sequence into the C-termini of existing proteins via the loss of a stop codon. Because we study recent additions to potentially old genes, we are able to apply a variety of stringent quality filters to our annotations of what is a true protein-coding gene, discarding the putative proteins of unknown function that are typical of recent fully de novo genes. We identify 54 examples of C-terminal extensions in Saccharomyces and 28 in Drosophila, all of them recent enough to still be polymorphic. We find one putative gene fusion that turns out, on close inspection, to be the product of replicated assembly errors, further highlighting the issue of false positives in the study of rare events. Four of the Saccharomyces C-terminal extensions (to ADH1, ARP8, TPM2, and PIS1) that survived our quality filters are predicted to lead to significant modification of a protein domain structure. Oxford University Press 2015-05-21 /pmc/articles/PMC4494051/ /pubmed/26002864 http://dx.doi.org/10.1093/gbe/evv098 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Andreatta, Matthew E.
Levine, Joshua A.
Foy, Scott G.
Guzman, Lynette D.
Kosinski, Luke J.
Cordes, Matthew H.J.
Masel, Joanna
The Recent De Novo Origin of Protein C-Termini
title The Recent De Novo Origin of Protein C-Termini
title_full The Recent De Novo Origin of Protein C-Termini
title_fullStr The Recent De Novo Origin of Protein C-Termini
title_full_unstemmed The Recent De Novo Origin of Protein C-Termini
title_short The Recent De Novo Origin of Protein C-Termini
title_sort recent de novo origin of protein c-termini
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4494051/
https://www.ncbi.nlm.nih.gov/pubmed/26002864
http://dx.doi.org/10.1093/gbe/evv098
work_keys_str_mv AT andreattamatthewe therecentdenovooriginofproteinctermini
AT levinejoshuaa therecentdenovooriginofproteinctermini
AT foyscottg therecentdenovooriginofproteinctermini
AT guzmanlynetted therecentdenovooriginofproteinctermini
AT kosinskilukej therecentdenovooriginofproteinctermini
AT cordesmatthewhj therecentdenovooriginofproteinctermini
AT maseljoanna therecentdenovooriginofproteinctermini
AT andreattamatthewe recentdenovooriginofproteinctermini
AT levinejoshuaa recentdenovooriginofproteinctermini
AT foyscottg recentdenovooriginofproteinctermini
AT guzmanlynetted recentdenovooriginofproteinctermini
AT kosinskilukej recentdenovooriginofproteinctermini
AT cordesmatthewhj recentdenovooriginofproteinctermini
AT maseljoanna recentdenovooriginofproteinctermini