Cargando…

Assessment of label-free quantification and missing value imputation for proteomics in non-human primates

BACKGROUND: Reliable and effective label-free quantification (LFQ) analyses are dependent not only on the method of data acquisition in the mass spectrometer, but also on the downstream data processing, including software tools, query database, data normalization and imputation. In non-human primate...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hamid, Zeeshan, Zimmerman, Kip D., Guillen-Ahlers, Hector, Li, Cun, Nathanielsz, Peter, Cox, Laura A., Olivier, Michael
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2022
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264528/ https://www.ncbi.nlm.nih.gov/pubmed/35804317 http://dx.doi.org/10.1186/s12864-022-08723-1

_version_	1784742983456784384
author	Hamid, Zeeshan Zimmerman, Kip D. Guillen-Ahlers, Hector Li, Cun Nathanielsz, Peter Cox, Laura A. Olivier, Michael
author_facet	Hamid, Zeeshan Zimmerman, Kip D. Guillen-Ahlers, Hector Li, Cun Nathanielsz, Peter Cox, Laura A. Olivier, Michael
author_sort	Hamid, Zeeshan
collection	PubMed
description	BACKGROUND: Reliable and effective label-free quantification (LFQ) analyses are dependent not only on the method of data acquisition in the mass spectrometer, but also on the downstream data processing, including software tools, query database, data normalization and imputation. In non-human primates (NHP), LFQ is challenging because the query databases for NHP are limited since the genomes of these species are not comprehensively annotated. This invariably results in limited discovery of proteins and associated Post Translational Modifications (PTMs) and a higher fraction of missing data points. While identification of fewer proteins and PTMs due to database limitations can negatively impact uncovering important and meaningful biological information, missing data also limits downstream analyses (e.g., multivariate analyses), decreases statistical power, biases statistical inference, and makes biological interpretation of the data more challenging. In this study we attempted to address both issues: first, we used the MetaMorphues proteomics search engine to counter the limits of NHP query databases and maximize the discovery of proteins and associated PTMs, and second, we evaluated different imputation methods for accurate data inference. We used a generic approach for missing data imputation analysis without distinguising the potential source of missing data (either non-assigned m/z or missing values across runs). RESULTS: Using the MetaMorpheus proteomics search engine we obtained quantitative data for 1622 proteins and 10,634 peptides including 58 different PTMs (biological, metal and artifacts) across a diverse age range of NHP brain frontal cortex. However, among the 1622 proteins identified, only 293 proteins were quantified across all samples with no missing values, emphasizing the importance of implementing an accurate and statiscaly valid imputation method to fill in missing data. In our imputation analysis we demonstrate that Single Imputation methods that borrow information from correlated proteins such as Generalized Ridge Regression (GRR), Random Forest (RF), local least squares (LLS), and a Bayesian Principal Component Analysis methods (BPCA), are able to estimate missing protein abundance values with great accuracy. CONCLUSIONS: Overall, this study offers a detailed comparative analysis of LFQ data generated in NHP and proposes strategies for improved LFQ in NHP proteomics data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-022-08723-1.
format	Online Article Text
id	pubmed-9264528
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-92645282022-07-09 Assessment of label-free quantification and missing value imputation for proteomics in non-human primates Hamid, Zeeshan Zimmerman, Kip D. Guillen-Ahlers, Hector Li, Cun Nathanielsz, Peter Cox, Laura A. Olivier, Michael BMC Genomics Research BACKGROUND: Reliable and effective label-free quantification (LFQ) analyses are dependent not only on the method of data acquisition in the mass spectrometer, but also on the downstream data processing, including software tools, query database, data normalization and imputation. In non-human primates (NHP), LFQ is challenging because the query databases for NHP are limited since the genomes of these species are not comprehensively annotated. This invariably results in limited discovery of proteins and associated Post Translational Modifications (PTMs) and a higher fraction of missing data points. While identification of fewer proteins and PTMs due to database limitations can negatively impact uncovering important and meaningful biological information, missing data also limits downstream analyses (e.g., multivariate analyses), decreases statistical power, biases statistical inference, and makes biological interpretation of the data more challenging. In this study we attempted to address both issues: first, we used the MetaMorphues proteomics search engine to counter the limits of NHP query databases and maximize the discovery of proteins and associated PTMs, and second, we evaluated different imputation methods for accurate data inference. We used a generic approach for missing data imputation analysis without distinguising the potential source of missing data (either non-assigned m/z or missing values across runs). RESULTS: Using the MetaMorpheus proteomics search engine we obtained quantitative data for 1622 proteins and 10,634 peptides including 58 different PTMs (biological, metal and artifacts) across a diverse age range of NHP brain frontal cortex. However, among the 1622 proteins identified, only 293 proteins were quantified across all samples with no missing values, emphasizing the importance of implementing an accurate and statiscaly valid imputation method to fill in missing data. In our imputation analysis we demonstrate that Single Imputation methods that borrow information from correlated proteins such as Generalized Ridge Regression (GRR), Random Forest (RF), local least squares (LLS), and a Bayesian Principal Component Analysis methods (BPCA), are able to estimate missing protein abundance values with great accuracy. CONCLUSIONS: Overall, this study offers a detailed comparative analysis of LFQ data generated in NHP and proposes strategies for improved LFQ in NHP proteomics data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-022-08723-1. BioMed Central 2022-07-08 /pmc/articles/PMC9264528/ /pubmed/35804317 http://dx.doi.org/10.1186/s12864-022-08723-1 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Hamid, Zeeshan Zimmerman, Kip D. Guillen-Ahlers, Hector Li, Cun Nathanielsz, Peter Cox, Laura A. Olivier, Michael Assessment of label-free quantification and missing value imputation for proteomics in non-human primates
title	Assessment of label-free quantification and missing value imputation for proteomics in non-human primates
title_full	Assessment of label-free quantification and missing value imputation for proteomics in non-human primates
title_fullStr	Assessment of label-free quantification and missing value imputation for proteomics in non-human primates
title_full_unstemmed	Assessment of label-free quantification and missing value imputation for proteomics in non-human primates
title_short	Assessment of label-free quantification and missing value imputation for proteomics in non-human primates
title_sort	assessment of label-free quantification and missing value imputation for proteomics in non-human primates
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264528/ https://www.ncbi.nlm.nih.gov/pubmed/35804317 http://dx.doi.org/10.1186/s12864-022-08723-1
work_keys_str_mv	AT hamidzeeshan assessmentoflabelfreequantificationandmissingvalueimputationforproteomicsinnonhumanprimates AT zimmermankipd assessmentoflabelfreequantificationandmissingvalueimputationforproteomicsinnonhumanprimates AT guillenahlershector assessmentoflabelfreequantificationandmissingvalueimputationforproteomicsinnonhumanprimates AT licun assessmentoflabelfreequantificationandmissingvalueimputationforproteomicsinnonhumanprimates AT nathanielszpeter assessmentoflabelfreequantificationandmissingvalueimputationforproteomicsinnonhumanprimates AT coxlauraa assessmentoflabelfreequantificationandmissingvalueimputationforproteomicsinnonhumanprimates AT oliviermichael assessmentoflabelfreequantificationandmissingvalueimputationforproteomicsinnonhumanprimates

Assessment of label-free quantification and missing value imputation for proteomics in non-human primates

Ejemplares similares