Cargando…

A knowledge-based T(2)-statistic to perform pathway analysis for quantitative proteomic data

Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; whi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lai, En-Yu, Chen, Yi-Hau, Wu, Kun-Pin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5493430/ https://www.ncbi.nlm.nih.gov/pubmed/28622336 http://dx.doi.org/10.1371/journal.pcbi.1005601

_version_	1783247504111828992
author	Lai, En-Yu Chen, Yi-Hau Wu, Kun-Pin
author_facet	Lai, En-Yu Chen, Yi-Hau Wu, Kun-Pin
author_sort	Lai, En-Yu
collection	PubMed
description	Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T(2)-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T(2)-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T(2)-statistic into an R package T2GA, which is available at https://github.com/roqe/T2GA.
format	Online Article Text
id	pubmed-5493430
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-54934302017-07-25 A knowledge-based T(2)-statistic to perform pathway analysis for quantitative proteomic data Lai, En-Yu Chen, Yi-Hau Wu, Kun-Pin PLoS Comput Biol Research Article Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T(2)-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T(2)-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T(2)-statistic into an R package T2GA, which is available at https://github.com/roqe/T2GA. Public Library of Science 2017-06-16 /pmc/articles/PMC5493430/ /pubmed/28622336 http://dx.doi.org/10.1371/journal.pcbi.1005601 Text en © 2017 Lai et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Lai, En-Yu Chen, Yi-Hau Wu, Kun-Pin A knowledge-based T(2)-statistic to perform pathway analysis for quantitative proteomic data
title	A knowledge-based T(2)-statistic to perform pathway analysis for quantitative proteomic data
title_full	A knowledge-based T(2)-statistic to perform pathway analysis for quantitative proteomic data
title_fullStr	A knowledge-based T(2)-statistic to perform pathway analysis for quantitative proteomic data
title_full_unstemmed	A knowledge-based T(2)-statistic to perform pathway analysis for quantitative proteomic data
title_short	A knowledge-based T(2)-statistic to perform pathway analysis for quantitative proteomic data
title_sort	knowledge-based t(2)-statistic to perform pathway analysis for quantitative proteomic data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5493430/ https://www.ncbi.nlm.nih.gov/pubmed/28622336 http://dx.doi.org/10.1371/journal.pcbi.1005601
work_keys_str_mv	AT laienyu aknowledgebasedt2statistictoperformpathwayanalysisforquantitativeproteomicdata AT chenyihau aknowledgebasedt2statistictoperformpathwayanalysisforquantitativeproteomicdata AT wukunpin aknowledgebasedt2statistictoperformpathwayanalysisforquantitativeproteomicdata AT laienyu knowledgebasedt2statistictoperformpathwayanalysisforquantitativeproteomicdata AT chenyihau knowledgebasedt2statistictoperformpathwayanalysisforquantitativeproteomicdata AT wukunpin knowledgebasedt2statistictoperformpathwayanalysisforquantitativeproteomicdata

A knowledge-based T(2)-statistic to perform pathway analysis for quantitative proteomic data

Ejemplares similares