Cargando…

Nano Random Forests to mine protein complexes and their relationships in quantitative proteomics data

Ever-increasing numbers of quantitative proteomics data sets constitute an underexploited resource for investigating protein function. Multiprotein complexes often follow consistent trends in these experiments, which could provide insights about their biology. Yet, as more experiments are considered...

Descripción completa

Detalles Bibliográficos
Autores principales:	Montaño-Gutierrez, Luis F., Ohta, Shinya, Kustatscher, Georg, Earnshaw, William C., Rappsilber, Juri
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	The American Society for Cell Biology 2017
Materias:	Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5328625/ https://www.ncbi.nlm.nih.gov/pubmed/28057767 http://dx.doi.org/10.1091/mbc.E16-06-0370

_version_	1782510917299208192
author	Montaño-Gutierrez, Luis F. Ohta, Shinya Kustatscher, Georg Earnshaw, William C. Rappsilber, Juri
author_facet	Montaño-Gutierrez, Luis F. Ohta, Shinya Kustatscher, Georg Earnshaw, William C. Rappsilber, Juri
author_sort	Montaño-Gutierrez, Luis F.
collection	PubMed
description	Ever-increasing numbers of quantitative proteomics data sets constitute an underexploited resource for investigating protein function. Multiprotein complexes often follow consistent trends in these experiments, which could provide insights about their biology. Yet, as more experiments are considered, a complex’s signature may become conditional and less identifiable. Previously we successfully distinguished the general proteomic signature of genuine chromosomal proteins from hitchhikers using the Random Forests (RF) machine learning algorithm. Here we test whether small protein complexes can define distinguishable signatures of their own, despite the assumption that machine learning needs large training sets. We show, with simulated and real proteomics data, that RF can detect small protein complexes and relationships between them. We identify several complexes in quantitative proteomics results of wild-type and knockout mitotic chromosomes. Other proteins covary strongly with these complexes, suggesting novel functional links for later study. Integrating the RF analysis for several complexes reveals known interdependences among kinetochore subunits and a novel dependence between the inner kinetochore and condensin. Ribosomal proteins, although identified, remained independent of kinetochore subcomplexes. Together these results show that this complex-oriented RF (NanoRF) approach can integrate proteomics data to uncover subtle protein relationships. Our NanoRF pipeline is available online.
format	Online Article Text
id	pubmed-5328625
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	The American Society for Cell Biology
record_format	MEDLINE/PubMed
spelling	pubmed-53286252017-05-16 Nano Random Forests to mine protein complexes and their relationships in quantitative proteomics data Montaño-Gutierrez, Luis F. Ohta, Shinya Kustatscher, Georg Earnshaw, William C. Rappsilber, Juri Mol Biol Cell Articles Ever-increasing numbers of quantitative proteomics data sets constitute an underexploited resource for investigating protein function. Multiprotein complexes often follow consistent trends in these experiments, which could provide insights about their biology. Yet, as more experiments are considered, a complex’s signature may become conditional and less identifiable. Previously we successfully distinguished the general proteomic signature of genuine chromosomal proteins from hitchhikers using the Random Forests (RF) machine learning algorithm. Here we test whether small protein complexes can define distinguishable signatures of their own, despite the assumption that machine learning needs large training sets. We show, with simulated and real proteomics data, that RF can detect small protein complexes and relationships between them. We identify several complexes in quantitative proteomics results of wild-type and knockout mitotic chromosomes. Other proteins covary strongly with these complexes, suggesting novel functional links for later study. Integrating the RF analysis for several complexes reveals known interdependences among kinetochore subunits and a novel dependence between the inner kinetochore and condensin. Ribosomal proteins, although identified, remained independent of kinetochore subcomplexes. Together these results show that this complex-oriented RF (NanoRF) approach can integrate proteomics data to uncover subtle protein relationships. Our NanoRF pipeline is available online. The American Society for Cell Biology 2017-03-01 /pmc/articles/PMC5328625/ /pubmed/28057767 http://dx.doi.org/10.1091/mbc.E16-06-0370 Text en © 2017 Montaño-Gutierrez et al. This article is distributed by The American Society for Cell Biology under license from the author(s). Two months after publication it is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0). “ASCB®,” “The American Society for Cell Biology®,” and “Molecular Biology of the Cell®” are registered trademarks of The American Society for Cell Biology.
spellingShingle	Articles Montaño-Gutierrez, Luis F. Ohta, Shinya Kustatscher, Georg Earnshaw, William C. Rappsilber, Juri Nano Random Forests to mine protein complexes and their relationships in quantitative proteomics data
title	Nano Random Forests to mine protein complexes and their relationships in quantitative proteomics data
title_full	Nano Random Forests to mine protein complexes and their relationships in quantitative proteomics data
title_fullStr	Nano Random Forests to mine protein complexes and their relationships in quantitative proteomics data
title_full_unstemmed	Nano Random Forests to mine protein complexes and their relationships in quantitative proteomics data
title_short	Nano Random Forests to mine protein complexes and their relationships in quantitative proteomics data
title_sort	nano random forests to mine protein complexes and their relationships in quantitative proteomics data
topic	Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5328625/ https://www.ncbi.nlm.nih.gov/pubmed/28057767 http://dx.doi.org/10.1091/mbc.E16-06-0370
work_keys_str_mv	AT montanogutierrezluisf nanorandomforeststomineproteincomplexesandtheirrelationshipsinquantitativeproteomicsdata AT ohtashinya nanorandomforeststomineproteincomplexesandtheirrelationshipsinquantitativeproteomicsdata AT kustatschergeorg nanorandomforeststomineproteincomplexesandtheirrelationshipsinquantitativeproteomicsdata AT earnshawwilliamc nanorandomforeststomineproteincomplexesandtheirrelationshipsinquantitativeproteomicsdata AT rappsilberjuri nanorandomforeststomineproteincomplexesandtheirrelationshipsinquantitativeproteomicsdata

Nano Random Forests to mine protein complexes and their relationships in quantitative proteomics data

Ejemplares similares