Cargando…

Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes

Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide mass spectrometry (MS) experiments. Here, we mapped peptides detected in seve...

Descripción completa

Detalles Bibliográficos
Autores principales: Ezkurdia, Iakes, Juan, David, Rodriguez, Jose Manuel, Frankish, Adam, Diekhans, Mark, Harrow, Jennifer, Vazquez, Jesus, Valencia, Alfonso, Tress, Michael L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4204768/
https://www.ncbi.nlm.nih.gov/pubmed/24939910
http://dx.doi.org/10.1093/hmg/ddu309
_version_ 1782340600195973120
author Ezkurdia, Iakes
Juan, David
Rodriguez, Jose Manuel
Frankish, Adam
Diekhans, Mark
Harrow, Jennifer
Vazquez, Jesus
Valencia, Alfonso
Tress, Michael L.
author_facet Ezkurdia, Iakes
Juan, David
Rodriguez, Jose Manuel
Frankish, Adam
Diekhans, Mark
Harrow, Jennifer
Vazquez, Jesus
Valencia, Alfonso
Tress, Michael L.
author_sort Ezkurdia, Iakes
collection PubMed
description Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide mass spectrometry (MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60% of the protein-coding genes in the GENCODE annotation of the human genome. We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for >96% of genes that evolved before bilateria. At the opposite end of the scale, we identified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2001 potential non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We identified peptides for just 3% of these genes. We show that many of these genes behave more like non-coding genes than protein-coding genes and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein-coding gene catalogue should be revised as part of the ongoing human genome annotation effort.
format Online
Article
Text
id pubmed-4204768
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-42047682014-10-23 Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes Ezkurdia, Iakes Juan, David Rodriguez, Jose Manuel Frankish, Adam Diekhans, Mark Harrow, Jennifer Vazquez, Jesus Valencia, Alfonso Tress, Michael L. Hum Mol Genet Articles Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide mass spectrometry (MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60% of the protein-coding genes in the GENCODE annotation of the human genome. We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for >96% of genes that evolved before bilateria. At the opposite end of the scale, we identified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2001 potential non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We identified peptides for just 3% of these genes. We show that many of these genes behave more like non-coding genes than protein-coding genes and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein-coding gene catalogue should be revised as part of the ongoing human genome annotation effort. Oxford University Press 2014-11-15 2014-06-16 /pmc/articles/PMC4204768/ /pubmed/24939910 http://dx.doi.org/10.1093/hmg/ddu309 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Articles
Ezkurdia, Iakes
Juan, David
Rodriguez, Jose Manuel
Frankish, Adam
Diekhans, Mark
Harrow, Jennifer
Vazquez, Jesus
Valencia, Alfonso
Tress, Michael L.
Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes
title Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes
title_full Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes
title_fullStr Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes
title_full_unstemmed Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes
title_short Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes
title_sort multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4204768/
https://www.ncbi.nlm.nih.gov/pubmed/24939910
http://dx.doi.org/10.1093/hmg/ddu309
work_keys_str_mv AT ezkurdiaiakes multipleevidencestrandssuggestthattheremaybeasfewas19000humanproteincodinggenes
AT juandavid multipleevidencestrandssuggestthattheremaybeasfewas19000humanproteincodinggenes
AT rodriguezjosemanuel multipleevidencestrandssuggestthattheremaybeasfewas19000humanproteincodinggenes
AT frankishadam multipleevidencestrandssuggestthattheremaybeasfewas19000humanproteincodinggenes
AT diekhansmark multipleevidencestrandssuggestthattheremaybeasfewas19000humanproteincodinggenes
AT harrowjennifer multipleevidencestrandssuggestthattheremaybeasfewas19000humanproteincodinggenes
AT vazquezjesus multipleevidencestrandssuggestthattheremaybeasfewas19000humanproteincodinggenes
AT valenciaalfonso multipleevidencestrandssuggestthattheremaybeasfewas19000humanproteincodinggenes
AT tressmichaell multipleevidencestrandssuggestthattheremaybeasfewas19000humanproteincodinggenes