Cargando…

Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics

Biological and biomedical research relies on comprehensive understanding of protein-coding transcripts. However, the total number of human proteins is still unknown due to the prevalence of alternative splicing. In this paper, we detected 31,566 novel transcripts with coding potential by filtering o...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Zhiqiang, Scott, Hamish S., Qin, Guangrong, Zheng, Guangyong, Chu, Xixia, Xie, Lu, Adelson, David L., Oftedal, Bergithe E., Venugopal, Parvathy, Babic, Milena, Hahn, Christopher N., Zhang, Bing, Wang, Xiaojing, Li, Nan, Wei, Chaochun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4496727/
https://www.ncbi.nlm.nih.gov/pubmed/26156868
http://dx.doi.org/10.1038/srep10940
_version_ 1782380452092313600
author Hu, Zhiqiang
Scott, Hamish S.
Qin, Guangrong
Zheng, Guangyong
Chu, Xixia
Xie, Lu
Adelson, David L.
Oftedal, Bergithe E.
Venugopal, Parvathy
Babic, Milena
Hahn, Christopher N.
Zhang, Bing
Wang, Xiaojing
Li, Nan
Wei, Chaochun
author_facet Hu, Zhiqiang
Scott, Hamish S.
Qin, Guangrong
Zheng, Guangyong
Chu, Xixia
Xie, Lu
Adelson, David L.
Oftedal, Bergithe E.
Venugopal, Parvathy
Babic, Milena
Hahn, Christopher N.
Zhang, Bing
Wang, Xiaojing
Li, Nan
Wei, Chaochun
author_sort Hu, Zhiqiang
collection PubMed
description Biological and biomedical research relies on comprehensive understanding of protein-coding transcripts. However, the total number of human proteins is still unknown due to the prevalence of alternative splicing. In this paper, we detected 31,566 novel transcripts with coding potential by filtering our ab initio predictions with 50 RNA-seq datasets from diverse tissues/cell lines. PCR followed by MiSeq sequencing showed that at least 84.1% of these predicted novel splice sites could be validated. In contrast to known transcripts, the expression of these novel transcripts were highly tissue-specific. Based on these novel transcripts, at least 36 novel proteins were detected from shotgun proteomics data of 41 breast samples. We also showed L1 retrotransposons have a more significant impact on the origin of new transcripts/genes than previously thought. Furthermore, we found that alternative splicing is extraordinarily widespread for genes involved in specific biological functions like protein binding, nucleoside binding, neuron projection, membrane organization and cell adhesion. In the end, the total number of human transcripts with protein-coding potential was estimated to be at least 204,950.
format Online
Article
Text
id pubmed-4496727
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-44967272015-07-13 Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics Hu, Zhiqiang Scott, Hamish S. Qin, Guangrong Zheng, Guangyong Chu, Xixia Xie, Lu Adelson, David L. Oftedal, Bergithe E. Venugopal, Parvathy Babic, Milena Hahn, Christopher N. Zhang, Bing Wang, Xiaojing Li, Nan Wei, Chaochun Sci Rep Article Biological and biomedical research relies on comprehensive understanding of protein-coding transcripts. However, the total number of human proteins is still unknown due to the prevalence of alternative splicing. In this paper, we detected 31,566 novel transcripts with coding potential by filtering our ab initio predictions with 50 RNA-seq datasets from diverse tissues/cell lines. PCR followed by MiSeq sequencing showed that at least 84.1% of these predicted novel splice sites could be validated. In contrast to known transcripts, the expression of these novel transcripts were highly tissue-specific. Based on these novel transcripts, at least 36 novel proteins were detected from shotgun proteomics data of 41 breast samples. We also showed L1 retrotransposons have a more significant impact on the origin of new transcripts/genes than previously thought. Furthermore, we found that alternative splicing is extraordinarily widespread for genes involved in specific biological functions like protein binding, nucleoside binding, neuron projection, membrane organization and cell adhesion. In the end, the total number of human transcripts with protein-coding potential was estimated to be at least 204,950. Nature Publishing Group 2015-07-09 /pmc/articles/PMC4496727/ /pubmed/26156868 http://dx.doi.org/10.1038/srep10940 Text en Copyright © 2015, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Hu, Zhiqiang
Scott, Hamish S.
Qin, Guangrong
Zheng, Guangyong
Chu, Xixia
Xie, Lu
Adelson, David L.
Oftedal, Bergithe E.
Venugopal, Parvathy
Babic, Milena
Hahn, Christopher N.
Zhang, Bing
Wang, Xiaojing
Li, Nan
Wei, Chaochun
Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics
title Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics
title_full Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics
title_fullStr Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics
title_full_unstemmed Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics
title_short Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics
title_sort revealing missing human protein isoforms based on ab initio prediction, rna-seq and proteomics
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4496727/
https://www.ncbi.nlm.nih.gov/pubmed/26156868
http://dx.doi.org/10.1038/srep10940
work_keys_str_mv AT huzhiqiang revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics
AT scotthamishs revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics
AT qinguangrong revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics
AT zhengguangyong revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics
AT chuxixia revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics
AT xielu revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics
AT adelsondavidl revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics
AT oftedalbergithee revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics
AT venugopalparvathy revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics
AT babicmilena revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics
AT hahnchristophern revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics
AT zhangbing revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics
AT wangxiaojing revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics
AT linan revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics
AT weichaochun revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics