Cargando…

Feature Selection Methods for Protein Biomarker Discovery from Proteomics or Multiomics Data

Untargeted mass spectrometry (MS)-based proteomics provides a powerful platform for protein biomarker discovery, but clinical translation depends on the selection of a small number of proteins for downstream verification and validation. Due to the small sample size of typical discovery studies, prot...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shi, Zhiao, Wen, Bo, Gao, Qiang, Zhang, Bing
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Society for Biochemistry and Molecular Biology 2021
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8165452/ https://www.ncbi.nlm.nih.gov/pubmed/33887487 http://dx.doi.org/10.1016/j.mcpro.2021.100083

_version_	1783701325329989632
author	Shi, Zhiao Wen, Bo Gao, Qiang Zhang, Bing
author_facet	Shi, Zhiao Wen, Bo Gao, Qiang Zhang, Bing
author_sort	Shi, Zhiao
collection	PubMed
description	Untargeted mass spectrometry (MS)-based proteomics provides a powerful platform for protein biomarker discovery, but clinical translation depends on the selection of a small number of proteins for downstream verification and validation. Due to the small sample size of typical discovery studies, protein markers identified from discovery data may not be generalizable to independent datasets. In addition, a good protein marker identified using a discovery platform may be difficult to implement in verification and validation platforms. Moreover, although multiomics characterization is being increasingly used in discovery cohort studies, there is no existing method for multiomics-facilitated protein biomarker selection. Here, we present ProMS, a computational algorithm for protein marker selection. The algorithm is based on the hypothesis that a phenotype is characterized by a few underlying biological functions, each manifested by a group of coexpressed proteins. A weighted k-medoids clustering algorithm is applied to all univariately informative proteins to identify both coexpressed protein clusters and a representative protein for each cluster as markers. In two clinically important classification problems, ProMS shows superior performance compared with existing feature selection methods. ProMS can be extended to the multiomics setting (ProMS_mo) through a constrained weighted k-medoids clustering algorithm, and the protein panels selected by ProMS_mo show improved performance on independent test data compared with ProMS. In addition to superior performance, ProMS and ProMS_mo also have two unique strengths. First, the feature clusters enable functional interpretation of the selected protein markers. Second, the feature clusters provide an opportunity to select replacement protein markers, facilitating a robust transition to the verification and validation platforms. In summary, this study provides a unified and effective computational framework for selecting protein biomarkers using proteomics or multiomics data. The software implementation is publicly available at https://github.com/bzhanglab/proms.
format	Online Article Text
id	pubmed-8165452
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	American Society for Biochemistry and Molecular Biology
record_format	MEDLINE/PubMed
spelling	pubmed-81654522021-06-05 Feature Selection Methods for Protein Biomarker Discovery from Proteomics or Multiomics Data Shi, Zhiao Wen, Bo Gao, Qiang Zhang, Bing Mol Cell Proteomics Research Untargeted mass spectrometry (MS)-based proteomics provides a powerful platform for protein biomarker discovery, but clinical translation depends on the selection of a small number of proteins for downstream verification and validation. Due to the small sample size of typical discovery studies, protein markers identified from discovery data may not be generalizable to independent datasets. In addition, a good protein marker identified using a discovery platform may be difficult to implement in verification and validation platforms. Moreover, although multiomics characterization is being increasingly used in discovery cohort studies, there is no existing method for multiomics-facilitated protein biomarker selection. Here, we present ProMS, a computational algorithm for protein marker selection. The algorithm is based on the hypothesis that a phenotype is characterized by a few underlying biological functions, each manifested by a group of coexpressed proteins. A weighted k-medoids clustering algorithm is applied to all univariately informative proteins to identify both coexpressed protein clusters and a representative protein for each cluster as markers. In two clinically important classification problems, ProMS shows superior performance compared with existing feature selection methods. ProMS can be extended to the multiomics setting (ProMS_mo) through a constrained weighted k-medoids clustering algorithm, and the protein panels selected by ProMS_mo show improved performance on independent test data compared with ProMS. In addition to superior performance, ProMS and ProMS_mo also have two unique strengths. First, the feature clusters enable functional interpretation of the selected protein markers. Second, the feature clusters provide an opportunity to select replacement protein markers, facilitating a robust transition to the verification and validation platforms. In summary, this study provides a unified and effective computational framework for selecting protein biomarkers using proteomics or multiomics data. The software implementation is publicly available at https://github.com/bzhanglab/proms. American Society for Biochemistry and Molecular Biology 2021-04-20 /pmc/articles/PMC8165452/ /pubmed/33887487 http://dx.doi.org/10.1016/j.mcpro.2021.100083 Text en © 2021 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Research Shi, Zhiao Wen, Bo Gao, Qiang Zhang, Bing Feature Selection Methods for Protein Biomarker Discovery from Proteomics or Multiomics Data
title	Feature Selection Methods for Protein Biomarker Discovery from Proteomics or Multiomics Data
title_full	Feature Selection Methods for Protein Biomarker Discovery from Proteomics or Multiomics Data
title_fullStr	Feature Selection Methods for Protein Biomarker Discovery from Proteomics or Multiomics Data
title_full_unstemmed	Feature Selection Methods for Protein Biomarker Discovery from Proteomics or Multiomics Data
title_short	Feature Selection Methods for Protein Biomarker Discovery from Proteomics or Multiomics Data
title_sort	feature selection methods for protein biomarker discovery from proteomics or multiomics data
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8165452/ https://www.ncbi.nlm.nih.gov/pubmed/33887487 http://dx.doi.org/10.1016/j.mcpro.2021.100083
work_keys_str_mv	AT shizhiao featureselectionmethodsforproteinbiomarkerdiscoveryfromproteomicsormultiomicsdata AT wenbo featureselectionmethodsforproteinbiomarkerdiscoveryfromproteomicsormultiomicsdata AT gaoqiang featureselectionmethodsforproteinbiomarkerdiscoveryfromproteomicsormultiomicsdata AT zhangbing featureselectionmethodsforproteinbiomarkerdiscoveryfromproteomicsormultiomicsdata

Feature Selection Methods for Protein Biomarker Discovery from Proteomics or Multiomics Data

Ejemplares similares