Cargando…

Machine Learning on Large-Scale Proteomics Data Identifies Tissue and Cell-Type Specific Proteins

[Image: see text] Using data from 183 public human data sets from PRIDE, a machine learning model was trained to identify tissue and cell-type specific protein patterns. PRIDE projects were searched with ionbot and tissue/cell type annotation was manually added. Data from physiological samples were...

Descripción completa

Detalles Bibliográficos
Autores principales: Claeys, Tine, Menu, Maxime, Bouwmeester, Robbin, Gevaert, Kris, Martens, Lennart
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10088018/
https://www.ncbi.nlm.nih.gov/pubmed/36963412
http://dx.doi.org/10.1021/acs.jproteome.2c00644
_version_ 1785022481912823808
author Claeys, Tine
Menu, Maxime
Bouwmeester, Robbin
Gevaert, Kris
Martens, Lennart
author_facet Claeys, Tine
Menu, Maxime
Bouwmeester, Robbin
Gevaert, Kris
Martens, Lennart
author_sort Claeys, Tine
collection PubMed
description [Image: see text] Using data from 183 public human data sets from PRIDE, a machine learning model was trained to identify tissue and cell-type specific protein patterns. PRIDE projects were searched with ionbot and tissue/cell type annotation was manually added. Data from physiological samples were used to train a Random Forest model on protein abundances to classify samples into tissues and cell types. Subsequently, a one-vs-all classification and feature importance were used to analyze the most discriminating protein abundances per class. Based on protein abundance alone, the model was able to predict tissues with 98% accuracy, and cell types with 99% accuracy. The F-scores describe a clear view on tissue-specific proteins and tissue-specific protein expression patterns. In-depth feature analysis shows slight confusion between physiologically similar tissues, demonstrating the capacity of the algorithm to detect biologically relevant patterns. These results can in turn inform downstream uses, from identification of the tissue of origin of proteins in complex samples such as liquid biopsies, to studying the proteome of tissue-like samples such as organoids and cell lines.
format Online
Article
Text
id pubmed-10088018
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-100880182023-04-12 Machine Learning on Large-Scale Proteomics Data Identifies Tissue and Cell-Type Specific Proteins Claeys, Tine Menu, Maxime Bouwmeester, Robbin Gevaert, Kris Martens, Lennart J Proteome Res [Image: see text] Using data from 183 public human data sets from PRIDE, a machine learning model was trained to identify tissue and cell-type specific protein patterns. PRIDE projects were searched with ionbot and tissue/cell type annotation was manually added. Data from physiological samples were used to train a Random Forest model on protein abundances to classify samples into tissues and cell types. Subsequently, a one-vs-all classification and feature importance were used to analyze the most discriminating protein abundances per class. Based on protein abundance alone, the model was able to predict tissues with 98% accuracy, and cell types with 99% accuracy. The F-scores describe a clear view on tissue-specific proteins and tissue-specific protein expression patterns. In-depth feature analysis shows slight confusion between physiologically similar tissues, demonstrating the capacity of the algorithm to detect biologically relevant patterns. These results can in turn inform downstream uses, from identification of the tissue of origin of proteins in complex samples such as liquid biopsies, to studying the proteome of tissue-like samples such as organoids and cell lines. American Chemical Society 2023-03-24 /pmc/articles/PMC10088018/ /pubmed/36963412 http://dx.doi.org/10.1021/acs.jproteome.2c00644 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Claeys, Tine
Menu, Maxime
Bouwmeester, Robbin
Gevaert, Kris
Martens, Lennart
Machine Learning on Large-Scale Proteomics Data Identifies Tissue and Cell-Type Specific Proteins
title Machine Learning on Large-Scale Proteomics Data Identifies Tissue and Cell-Type Specific Proteins
title_full Machine Learning on Large-Scale Proteomics Data Identifies Tissue and Cell-Type Specific Proteins
title_fullStr Machine Learning on Large-Scale Proteomics Data Identifies Tissue and Cell-Type Specific Proteins
title_full_unstemmed Machine Learning on Large-Scale Proteomics Data Identifies Tissue and Cell-Type Specific Proteins
title_short Machine Learning on Large-Scale Proteomics Data Identifies Tissue and Cell-Type Specific Proteins
title_sort machine learning on large-scale proteomics data identifies tissue and cell-type specific proteins
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10088018/
https://www.ncbi.nlm.nih.gov/pubmed/36963412
http://dx.doi.org/10.1021/acs.jproteome.2c00644
work_keys_str_mv AT claeystine machinelearningonlargescaleproteomicsdataidentifiestissueandcelltypespecificproteins
AT menumaxime machinelearningonlargescaleproteomicsdataidentifiestissueandcelltypespecificproteins
AT bouwmeesterrobbin machinelearningonlargescaleproteomicsdataidentifiestissueandcelltypespecificproteins
AT gevaertkris machinelearningonlargescaleproteomicsdataidentifiestissueandcelltypespecificproteins
AT martenslennart machinelearningonlargescaleproteomicsdataidentifiestissueandcelltypespecificproteins