Cargando…
Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics
Biological and biomedical research relies on comprehensive understanding of protein-coding transcripts. However, the total number of human proteins is still unknown due to the prevalence of alternative splicing. In this paper, we detected 31,566 novel transcripts with coding potential by filtering o...
Autores principales: | , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4496727/ https://www.ncbi.nlm.nih.gov/pubmed/26156868 http://dx.doi.org/10.1038/srep10940 |
_version_ | 1782380452092313600 |
---|---|
author | Hu, Zhiqiang Scott, Hamish S. Qin, Guangrong Zheng, Guangyong Chu, Xixia Xie, Lu Adelson, David L. Oftedal, Bergithe E. Venugopal, Parvathy Babic, Milena Hahn, Christopher N. Zhang, Bing Wang, Xiaojing Li, Nan Wei, Chaochun |
author_facet | Hu, Zhiqiang Scott, Hamish S. Qin, Guangrong Zheng, Guangyong Chu, Xixia Xie, Lu Adelson, David L. Oftedal, Bergithe E. Venugopal, Parvathy Babic, Milena Hahn, Christopher N. Zhang, Bing Wang, Xiaojing Li, Nan Wei, Chaochun |
author_sort | Hu, Zhiqiang |
collection | PubMed |
description | Biological and biomedical research relies on comprehensive understanding of protein-coding transcripts. However, the total number of human proteins is still unknown due to the prevalence of alternative splicing. In this paper, we detected 31,566 novel transcripts with coding potential by filtering our ab initio predictions with 50 RNA-seq datasets from diverse tissues/cell lines. PCR followed by MiSeq sequencing showed that at least 84.1% of these predicted novel splice sites could be validated. In contrast to known transcripts, the expression of these novel transcripts were highly tissue-specific. Based on these novel transcripts, at least 36 novel proteins were detected from shotgun proteomics data of 41 breast samples. We also showed L1 retrotransposons have a more significant impact on the origin of new transcripts/genes than previously thought. Furthermore, we found that alternative splicing is extraordinarily widespread for genes involved in specific biological functions like protein binding, nucleoside binding, neuron projection, membrane organization and cell adhesion. In the end, the total number of human transcripts with protein-coding potential was estimated to be at least 204,950. |
format | Online Article Text |
id | pubmed-4496727 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-44967272015-07-13 Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics Hu, Zhiqiang Scott, Hamish S. Qin, Guangrong Zheng, Guangyong Chu, Xixia Xie, Lu Adelson, David L. Oftedal, Bergithe E. Venugopal, Parvathy Babic, Milena Hahn, Christopher N. Zhang, Bing Wang, Xiaojing Li, Nan Wei, Chaochun Sci Rep Article Biological and biomedical research relies on comprehensive understanding of protein-coding transcripts. However, the total number of human proteins is still unknown due to the prevalence of alternative splicing. In this paper, we detected 31,566 novel transcripts with coding potential by filtering our ab initio predictions with 50 RNA-seq datasets from diverse tissues/cell lines. PCR followed by MiSeq sequencing showed that at least 84.1% of these predicted novel splice sites could be validated. In contrast to known transcripts, the expression of these novel transcripts were highly tissue-specific. Based on these novel transcripts, at least 36 novel proteins were detected from shotgun proteomics data of 41 breast samples. We also showed L1 retrotransposons have a more significant impact on the origin of new transcripts/genes than previously thought. Furthermore, we found that alternative splicing is extraordinarily widespread for genes involved in specific biological functions like protein binding, nucleoside binding, neuron projection, membrane organization and cell adhesion. In the end, the total number of human transcripts with protein-coding potential was estimated to be at least 204,950. Nature Publishing Group 2015-07-09 /pmc/articles/PMC4496727/ /pubmed/26156868 http://dx.doi.org/10.1038/srep10940 Text en Copyright © 2015, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Article Hu, Zhiqiang Scott, Hamish S. Qin, Guangrong Zheng, Guangyong Chu, Xixia Xie, Lu Adelson, David L. Oftedal, Bergithe E. Venugopal, Parvathy Babic, Milena Hahn, Christopher N. Zhang, Bing Wang, Xiaojing Li, Nan Wei, Chaochun Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics |
title | Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics |
title_full | Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics |
title_fullStr | Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics |
title_full_unstemmed | Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics |
title_short | Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics |
title_sort | revealing missing human protein isoforms based on ab initio prediction, rna-seq and proteomics |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4496727/ https://www.ncbi.nlm.nih.gov/pubmed/26156868 http://dx.doi.org/10.1038/srep10940 |
work_keys_str_mv | AT huzhiqiang revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics AT scotthamishs revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics AT qinguangrong revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics AT zhengguangyong revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics AT chuxixia revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics AT xielu revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics AT adelsondavidl revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics AT oftedalbergithee revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics AT venugopalparvathy revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics AT babicmilena revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics AT hahnchristophern revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics AT zhangbing revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics AT wangxiaojing revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics AT linan revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics AT weichaochun revealingmissinghumanproteinisoformsbasedonabinitiopredictionrnaseqandproteomics |