Cargando…

Accurate annotation of human protein-coding small open reading frames

Functional protein-coding small open reading frames (smORFs) are emerging as an important class of genes. However, the number of translated smORFs in the human genome is unclear because proteogenomic methods are not sensitive enough, and, as we show, Ribo-Seq strategies require additional measures t...

Descripción completa

Detalles Bibliográficos
Autores principales: Martinez, Thomas F., Chu, Qian, Donaldson, Cynthia, Tan, Dan, Shokhirev, Maxim N., Saghatelian, Alan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7085969/
https://www.ncbi.nlm.nih.gov/pubmed/31819274
http://dx.doi.org/10.1038/s41589-019-0425-0
_version_ 1783509045272903680
author Martinez, Thomas F.
Chu, Qian
Donaldson, Cynthia
Tan, Dan
Shokhirev, Maxim N.
Saghatelian, Alan
author_facet Martinez, Thomas F.
Chu, Qian
Donaldson, Cynthia
Tan, Dan
Shokhirev, Maxim N.
Saghatelian, Alan
author_sort Martinez, Thomas F.
collection PubMed
description Functional protein-coding small open reading frames (smORFs) are emerging as an important class of genes. However, the number of translated smORFs in the human genome is unclear because proteogenomic methods are not sensitive enough, and, as we show, Ribo-Seq strategies require additional measures to ensure comprehensive and accurate smORF annotation. Here, we integrate de novo transcriptome assembly and Ribo-Seq into an improved workflow that overcomes obstacles with previous methods to more confidently annotate thousands of smORFs. Evolutionary conservation analyses suggest that hundreds of smORF-encoded microproteins are likely functional. Additionally, many smORFs are regulated during fundamental biological processes, such as cell stress. Peptides derived from smORFs are also detectable on human leukocyte antigen complexes, revealing smORFs as a source of antigens. Thus, by including additional validation into our smORF annotation workflow, we accurately identify thousands of unannotated translated smORFs that will provide a rich pool of unexplored, functional human genes.
format Online
Article
Text
id pubmed-7085969
institution National Center for Biotechnology Information
language English
publishDate 2019
record_format MEDLINE/PubMed
spelling pubmed-70859692020-06-09 Accurate annotation of human protein-coding small open reading frames Martinez, Thomas F. Chu, Qian Donaldson, Cynthia Tan, Dan Shokhirev, Maxim N. Saghatelian, Alan Nat Chem Biol Article Functional protein-coding small open reading frames (smORFs) are emerging as an important class of genes. However, the number of translated smORFs in the human genome is unclear because proteogenomic methods are not sensitive enough, and, as we show, Ribo-Seq strategies require additional measures to ensure comprehensive and accurate smORF annotation. Here, we integrate de novo transcriptome assembly and Ribo-Seq into an improved workflow that overcomes obstacles with previous methods to more confidently annotate thousands of smORFs. Evolutionary conservation analyses suggest that hundreds of smORF-encoded microproteins are likely functional. Additionally, many smORFs are regulated during fundamental biological processes, such as cell stress. Peptides derived from smORFs are also detectable on human leukocyte antigen complexes, revealing smORFs as a source of antigens. Thus, by including additional validation into our smORF annotation workflow, we accurately identify thousands of unannotated translated smORFs that will provide a rich pool of unexplored, functional human genes. 2019-12-09 2020-04 /pmc/articles/PMC7085969/ /pubmed/31819274 http://dx.doi.org/10.1038/s41589-019-0425-0 Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms
spellingShingle Article
Martinez, Thomas F.
Chu, Qian
Donaldson, Cynthia
Tan, Dan
Shokhirev, Maxim N.
Saghatelian, Alan
Accurate annotation of human protein-coding small open reading frames
title Accurate annotation of human protein-coding small open reading frames
title_full Accurate annotation of human protein-coding small open reading frames
title_fullStr Accurate annotation of human protein-coding small open reading frames
title_full_unstemmed Accurate annotation of human protein-coding small open reading frames
title_short Accurate annotation of human protein-coding small open reading frames
title_sort accurate annotation of human protein-coding small open reading frames
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7085969/
https://www.ncbi.nlm.nih.gov/pubmed/31819274
http://dx.doi.org/10.1038/s41589-019-0425-0
work_keys_str_mv AT martinezthomasf accurateannotationofhumanproteincodingsmallopenreadingframes
AT chuqian accurateannotationofhumanproteincodingsmallopenreadingframes
AT donaldsoncynthia accurateannotationofhumanproteincodingsmallopenreadingframes
AT tandan accurateannotationofhumanproteincodingsmallopenreadingframes
AT shokhirevmaximn accurateannotationofhumanproteincodingsmallopenreadingframes
AT saghatelianalan accurateannotationofhumanproteincodingsmallopenreadingframes