Cargando…

Protein prediction models support widespread post-transcriptional regulation of protein abundance by interacting partners

Protein and mRNA levels correlate only moderately. The availability of proteogenomics data sets with protein and transcript measurements from matching samples is providing new opportunities to assess the degree to which protein levels in a system can be predicted from mRNA information. Here we exami...

Descripción completa

Detalles Bibliográficos
Autores principales:	Srivastava, Himangi, Lippincott, Michael J., Currie, Jordan, Canfield, Robert, Lam, Maggie P. Y., Lau, Edward
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9681107/ https://www.ncbi.nlm.nih.gov/pubmed/36356032 http://dx.doi.org/10.1371/journal.pcbi.1010702

_version_	1784834546003345408
author	Srivastava, Himangi Lippincott, Michael J. Currie, Jordan Canfield, Robert Lam, Maggie P. Y. Lau, Edward
author_facet	Srivastava, Himangi Lippincott, Michael J. Currie, Jordan Canfield, Robert Lam, Maggie P. Y. Lau, Edward
author_sort	Srivastava, Himangi
collection	PubMed
description	Protein and mRNA levels correlate only moderately. The availability of proteogenomics data sets with protein and transcript measurements from matching samples is providing new opportunities to assess the degree to which protein levels in a system can be predicted from mRNA information. Here we examined the contributions of input features in protein abundance prediction models. Using large proteogenomics data from 8 cancer types within the Clinical Proteomic Tumor Analysis Consortium (CPTAC) data set, we trained models to predict the abundance of over 13,000 proteins using matching transcriptome data from up to 958 tumor or normal adjacent tissue samples each, and compared predictive performances across algorithms, data set sizes, and input features. Over one-third of proteins (4,648) showed relatively poor predictability (elastic net r ≤ 0.3) from their cognate transcripts. Moreover, we found widespread occurrences where the abundance of a protein is considerably less well explained by its own cognate transcript level than that of one or more trans locus transcripts. The incorporation of additional trans-locus transcript abundance data as input features increasingly improved the ability to predict sample protein abundance. Transcripts that contribute to non-cognate protein abundance primarily involve those encoding known or predicted interaction partners of the protein of interest, including not only large multi-protein complexes as previously shown, but also small stable complexes in the proteome with only one or few stable interacting partners. Network analysis further shows a complex proteome-wide interdependency of protein abundance on the transcript levels of multiple interacting partners. The predictive model analysis here therefore supports that protein-protein interaction including in small protein complexes exert post-transcriptional influence on proteome compositions more broadly than previously recognized. Moreover, the results suggest mRNA and protein co-expression analysis may have utility for finding gene interactions and predicting expression changes in biological systems.
format	Online Article Text
id	pubmed-9681107
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-96811072022-11-23 Protein prediction models support widespread post-transcriptional regulation of protein abundance by interacting partners Srivastava, Himangi Lippincott, Michael J. Currie, Jordan Canfield, Robert Lam, Maggie P. Y. Lau, Edward PLoS Comput Biol Research Article Protein and mRNA levels correlate only moderately. The availability of proteogenomics data sets with protein and transcript measurements from matching samples is providing new opportunities to assess the degree to which protein levels in a system can be predicted from mRNA information. Here we examined the contributions of input features in protein abundance prediction models. Using large proteogenomics data from 8 cancer types within the Clinical Proteomic Tumor Analysis Consortium (CPTAC) data set, we trained models to predict the abundance of over 13,000 proteins using matching transcriptome data from up to 958 tumor or normal adjacent tissue samples each, and compared predictive performances across algorithms, data set sizes, and input features. Over one-third of proteins (4,648) showed relatively poor predictability (elastic net r ≤ 0.3) from their cognate transcripts. Moreover, we found widespread occurrences where the abundance of a protein is considerably less well explained by its own cognate transcript level than that of one or more trans locus transcripts. The incorporation of additional trans-locus transcript abundance data as input features increasingly improved the ability to predict sample protein abundance. Transcripts that contribute to non-cognate protein abundance primarily involve those encoding known or predicted interaction partners of the protein of interest, including not only large multi-protein complexes as previously shown, but also small stable complexes in the proteome with only one or few stable interacting partners. Network analysis further shows a complex proteome-wide interdependency of protein abundance on the transcript levels of multiple interacting partners. The predictive model analysis here therefore supports that protein-protein interaction including in small protein complexes exert post-transcriptional influence on proteome compositions more broadly than previously recognized. Moreover, the results suggest mRNA and protein co-expression analysis may have utility for finding gene interactions and predicting expression changes in biological systems. Public Library of Science 2022-11-10 /pmc/articles/PMC9681107/ /pubmed/36356032 http://dx.doi.org/10.1371/journal.pcbi.1010702 Text en © 2022 Srivastava et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Srivastava, Himangi Lippincott, Michael J. Currie, Jordan Canfield, Robert Lam, Maggie P. Y. Lau, Edward Protein prediction models support widespread post-transcriptional regulation of protein abundance by interacting partners
title	Protein prediction models support widespread post-transcriptional regulation of protein abundance by interacting partners
title_full	Protein prediction models support widespread post-transcriptional regulation of protein abundance by interacting partners
title_fullStr	Protein prediction models support widespread post-transcriptional regulation of protein abundance by interacting partners
title_full_unstemmed	Protein prediction models support widespread post-transcriptional regulation of protein abundance by interacting partners
title_short	Protein prediction models support widespread post-transcriptional regulation of protein abundance by interacting partners
title_sort	protein prediction models support widespread post-transcriptional regulation of protein abundance by interacting partners
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9681107/ https://www.ncbi.nlm.nih.gov/pubmed/36356032 http://dx.doi.org/10.1371/journal.pcbi.1010702
work_keys_str_mv	AT srivastavahimangi proteinpredictionmodelssupportwidespreadposttranscriptionalregulationofproteinabundancebyinteractingpartners AT lippincottmichaelj proteinpredictionmodelssupportwidespreadposttranscriptionalregulationofproteinabundancebyinteractingpartners AT curriejordan proteinpredictionmodelssupportwidespreadposttranscriptionalregulationofproteinabundancebyinteractingpartners AT canfieldrobert proteinpredictionmodelssupportwidespreadposttranscriptionalregulationofproteinabundancebyinteractingpartners AT lammaggiepy proteinpredictionmodelssupportwidespreadposttranscriptionalregulationofproteinabundancebyinteractingpartners AT lauedward proteinpredictionmodelssupportwidespreadposttranscriptionalregulationofproteinabundancebyinteractingpartners

Protein prediction models support widespread post-transcriptional regulation of protein abundance by interacting partners

Ejemplares similares