Cargando…

The influence of transcript assembly on the proteogenomics discovery of microproteins

Proteogenomics methods have identified many non-annotated protein-coding genes in the human genome. Many of the newly discovered protein-coding genes encode peptides and small proteins, referred to collectively as microproteins. Microproteins are produced through ribosome translation of small open r...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Jiao, Saghatelian, Alan, Shokhirev, Maxim Nikolaievich
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870951/
https://www.ncbi.nlm.nih.gov/pubmed/29584760
http://dx.doi.org/10.1371/journal.pone.0194518
_version_ 1783309570772303872
author Ma, Jiao
Saghatelian, Alan
Shokhirev, Maxim Nikolaievich
author_facet Ma, Jiao
Saghatelian, Alan
Shokhirev, Maxim Nikolaievich
author_sort Ma, Jiao
collection PubMed
description Proteogenomics methods have identified many non-annotated protein-coding genes in the human genome. Many of the newly discovered protein-coding genes encode peptides and small proteins, referred to collectively as microproteins. Microproteins are produced through ribosome translation of small open reading frames (smORFs). The discovery of many smORFs reveals a blind spot in traditional gene-finding algorithms for these genes. Biological studies have found roles for microproteins in cell biology and physiology, and the potential that there exists additional bioactive microproteins drives the interest in detection and discovery of these molecules. A key step in any proteogenomics workflow is the assembly of RNA-Seq data into likely mRNA transcripts that are then used to create a searchable protein database. Here we demonstrate that specific features of the assembled transcriptome impact microprotein detection by shotgun proteomics. By tailoring transcript assembly for downstream mass spectrometry searching, we show that we can detect more than double the number of high-quality microprotein candidates and introduce a novel open-source mRNA assembler for proteogenomics (MAPS) that incorporates all of these features. By integrating our specialized assembler, MAPS, and a popular generalized assembler into our proteogenomics pipeline, we detect 45 novel human microproteins from a high quality proteogenomics dataset of a human cell line. We then characterize the features of the novel microproteins, identifying two classes of microproteins. Our work highlights the importance of specialized transcriptome assembly upstream of proteomics validation when searching for short and potentially rare and poorly conserved proteins.
format Online
Article
Text
id pubmed-5870951
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-58709512018-04-06 The influence of transcript assembly on the proteogenomics discovery of microproteins Ma, Jiao Saghatelian, Alan Shokhirev, Maxim Nikolaievich PLoS One Research Article Proteogenomics methods have identified many non-annotated protein-coding genes in the human genome. Many of the newly discovered protein-coding genes encode peptides and small proteins, referred to collectively as microproteins. Microproteins are produced through ribosome translation of small open reading frames (smORFs). The discovery of many smORFs reveals a blind spot in traditional gene-finding algorithms for these genes. Biological studies have found roles for microproteins in cell biology and physiology, and the potential that there exists additional bioactive microproteins drives the interest in detection and discovery of these molecules. A key step in any proteogenomics workflow is the assembly of RNA-Seq data into likely mRNA transcripts that are then used to create a searchable protein database. Here we demonstrate that specific features of the assembled transcriptome impact microprotein detection by shotgun proteomics. By tailoring transcript assembly for downstream mass spectrometry searching, we show that we can detect more than double the number of high-quality microprotein candidates and introduce a novel open-source mRNA assembler for proteogenomics (MAPS) that incorporates all of these features. By integrating our specialized assembler, MAPS, and a popular generalized assembler into our proteogenomics pipeline, we detect 45 novel human microproteins from a high quality proteogenomics dataset of a human cell line. We then characterize the features of the novel microproteins, identifying two classes of microproteins. Our work highlights the importance of specialized transcriptome assembly upstream of proteomics validation when searching for short and potentially rare and poorly conserved proteins. Public Library of Science 2018-03-27 /pmc/articles/PMC5870951/ /pubmed/29584760 http://dx.doi.org/10.1371/journal.pone.0194518 Text en © 2018 Ma et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Ma, Jiao
Saghatelian, Alan
Shokhirev, Maxim Nikolaievich
The influence of transcript assembly on the proteogenomics discovery of microproteins
title The influence of transcript assembly on the proteogenomics discovery of microproteins
title_full The influence of transcript assembly on the proteogenomics discovery of microproteins
title_fullStr The influence of transcript assembly on the proteogenomics discovery of microproteins
title_full_unstemmed The influence of transcript assembly on the proteogenomics discovery of microproteins
title_short The influence of transcript assembly on the proteogenomics discovery of microproteins
title_sort influence of transcript assembly on the proteogenomics discovery of microproteins
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870951/
https://www.ncbi.nlm.nih.gov/pubmed/29584760
http://dx.doi.org/10.1371/journal.pone.0194518
work_keys_str_mv AT majiao theinfluenceoftranscriptassemblyontheproteogenomicsdiscoveryofmicroproteins
AT saghatelianalan theinfluenceoftranscriptassemblyontheproteogenomicsdiscoveryofmicroproteins
AT shokhirevmaximnikolaievich theinfluenceoftranscriptassemblyontheproteogenomicsdiscoveryofmicroproteins
AT majiao influenceoftranscriptassemblyontheproteogenomicsdiscoveryofmicroproteins
AT saghatelianalan influenceoftranscriptassemblyontheproteogenomicsdiscoveryofmicroproteins
AT shokhirevmaximnikolaievich influenceoftranscriptassemblyontheproteogenomicsdiscoveryofmicroproteins