Cargando…

Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow

Complete annotation of the human genome is indispensable for medical research. The GENCODE consortium strives to provide this, augmenting computational and experimental evidence with manual annotation. The rapidly developing field of proteogenomics provides evidence for the translation of genes into...

Descripción completa

Detalles Bibliográficos
Autores principales: Wright, James C., Mudge, Jonathan, Weisser, Hendrik, Barzine, Mitra P., Gonzalez, Jose M., Brazma, Alvis, Choudhary, Jyoti S., Harrow, Jennifer
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895710/
https://www.ncbi.nlm.nih.gov/pubmed/27250503
http://dx.doi.org/10.1038/ncomms11778
_version_ 1782435906600304640
author Wright, James C.
Mudge, Jonathan
Weisser, Hendrik
Barzine, Mitra P.
Gonzalez, Jose M.
Brazma, Alvis
Choudhary, Jyoti S.
Harrow, Jennifer
author_facet Wright, James C.
Mudge, Jonathan
Weisser, Hendrik
Barzine, Mitra P.
Gonzalez, Jose M.
Brazma, Alvis
Choudhary, Jyoti S.
Harrow, Jennifer
author_sort Wright, James C.
collection PubMed
description Complete annotation of the human genome is indispensable for medical research. The GENCODE consortium strives to provide this, augmenting computational and experimental evidence with manual annotation. The rapidly developing field of proteogenomics provides evidence for the translation of genes into proteins and can be used to discover and refine gene models. However, for both the proteomics and annotation groups, there is a lack of guidelines for integrating this data. Here we report a stringent workflow for the interpretation of proteogenomic data that could be used by the annotation community to interpret novel proteogenomic evidence. Based on reprocessing of three large-scale publicly available human data sets, we show that a conservative approach, using stringent filtering is required to generate valid identifications. Evidence has been found supporting 16 novel protein-coding genes being added to GENCODE. Despite this many peptide identifications in pseudogenes cannot be annotated due to the absence of orthogonal supporting evidence.
format Online
Article
Text
id pubmed-4895710
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-48957102016-08-18 Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow Wright, James C. Mudge, Jonathan Weisser, Hendrik Barzine, Mitra P. Gonzalez, Jose M. Brazma, Alvis Choudhary, Jyoti S. Harrow, Jennifer Nat Commun Article Complete annotation of the human genome is indispensable for medical research. The GENCODE consortium strives to provide this, augmenting computational and experimental evidence with manual annotation. The rapidly developing field of proteogenomics provides evidence for the translation of genes into proteins and can be used to discover and refine gene models. However, for both the proteomics and annotation groups, there is a lack of guidelines for integrating this data. Here we report a stringent workflow for the interpretation of proteogenomic data that could be used by the annotation community to interpret novel proteogenomic evidence. Based on reprocessing of three large-scale publicly available human data sets, we show that a conservative approach, using stringent filtering is required to generate valid identifications. Evidence has been found supporting 16 novel protein-coding genes being added to GENCODE. Despite this many peptide identifications in pseudogenes cannot be annotated due to the absence of orthogonal supporting evidence. Nature Publishing Group 2016-06-02 /pmc/articles/PMC4895710/ /pubmed/27250503 http://dx.doi.org/10.1038/ncomms11778 Text en Copyright © 2016, Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved. http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Wright, James C.
Mudge, Jonathan
Weisser, Hendrik
Barzine, Mitra P.
Gonzalez, Jose M.
Brazma, Alvis
Choudhary, Jyoti S.
Harrow, Jennifer
Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow
title Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow
title_full Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow
title_fullStr Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow
title_full_unstemmed Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow
title_short Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow
title_sort improving gencode reference gene annotation using a high-stringency proteogenomics workflow
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895710/
https://www.ncbi.nlm.nih.gov/pubmed/27250503
http://dx.doi.org/10.1038/ncomms11778
work_keys_str_mv AT wrightjamesc improvinggencodereferencegeneannotationusingahighstringencyproteogenomicsworkflow
AT mudgejonathan improvinggencodereferencegeneannotationusingahighstringencyproteogenomicsworkflow
AT weisserhendrik improvinggencodereferencegeneannotationusingahighstringencyproteogenomicsworkflow
AT barzinemitrap improvinggencodereferencegeneannotationusingahighstringencyproteogenomicsworkflow
AT gonzalezjosem improvinggencodereferencegeneannotationusingahighstringencyproteogenomicsworkflow
AT brazmaalvis improvinggencodereferencegeneannotationusingahighstringencyproteogenomicsworkflow
AT choudharyjyotis improvinggencodereferencegeneannotationusingahighstringencyproteogenomicsworkflow
AT harrowjennifer improvinggencodereferencegeneannotationusingahighstringencyproteogenomicsworkflow