Cargando…

Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery

BACKGROUND: In biomarker discovery, applying domain knowledge is an effective approach to eliminating false positive features, prioritizing functionally impactful markers and facilitating the interpretation of predictive signatures. Several computational methods have been developed that formulate th...

Descripción completa

Detalles Bibliográficos
Autores principales:	Guan, Xin, Runger, George, Liu, Li
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7068914/ https://www.ncbi.nlm.nih.gov/pubmed/32164534 http://dx.doi.org/10.1186/s12859-020-3344-x

_version_	1783505670347161600
author	Guan, Xin Runger, George Liu, Li
author_facet	Guan, Xin Runger, George Liu, Li
author_sort	Guan, Xin
collection	PubMed
description	BACKGROUND: In biomarker discovery, applying domain knowledge is an effective approach to eliminating false positive features, prioritizing functionally impactful markers and facilitating the interpretation of predictive signatures. Several computational methods have been developed that formulate the knowledge-based biomarker discovery as a feature selection problem guided by prior information. These methods often require that prior information is encoded as a single score and the algorithms are optimized for biological knowledge of a specific type. However, in practice, domain knowledge from diverse resources can provide complementary information. But no current methods can integrate heterogeneous prior information for biomarker discovery. To address this problem, we developed the Know-GRRF (know-guided regularized random forest) method that enables dynamic incorporation of domain knowledge from multiple disciplines to guide feature selection. RESULTS: Know-GRRF embeds domain knowledge in a regularized random forest framework. It combines prior information from multiple domains in a linear model to derive a composite score, which, together with other tuning parameters, controls the regularization of the random forests model. Know-GRRF concurrently optimizes the weight given to each type of domain knowledge and other tuning parameters to minimize the AIC of out-of-bag predictions. The objective is to select a compact feature subset that has a high discriminative power and strong functional relevance to the biological phenotype. Via rigorous simulations, we show that Know-GRRF guided by multiple-domain prior information outperforms feature selection methods guided by single-domain prior information or no prior information. We then applied Known-GRRF to a real-world study to identify prognostic biomarkers of prostate cancers. We evaluated the combination of cancer-related gene annotations, evolutionary conservation and pre-computed statistical scores as the prior knowledge to assemble a panel of biomarkers. We discovered a compact set of biomarkers with significant improvements on prediction accuracies. CONCLUSIONS: Know-GRRF is a powerful novel method to incorporate knowledge from multiple domains for feature selection. It has a broad range of applications in biomarker discoveries. We implemented this method and released a KnowGRRF package in the R/CRAN archive.
format	Online Article Text
id	pubmed-7068914
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-70689142020-03-18 Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery Guan, Xin Runger, George Liu, Li BMC Bioinformatics Methodology BACKGROUND: In biomarker discovery, applying domain knowledge is an effective approach to eliminating false positive features, prioritizing functionally impactful markers and facilitating the interpretation of predictive signatures. Several computational methods have been developed that formulate the knowledge-based biomarker discovery as a feature selection problem guided by prior information. These methods often require that prior information is encoded as a single score and the algorithms are optimized for biological knowledge of a specific type. However, in practice, domain knowledge from diverse resources can provide complementary information. But no current methods can integrate heterogeneous prior information for biomarker discovery. To address this problem, we developed the Know-GRRF (know-guided regularized random forest) method that enables dynamic incorporation of domain knowledge from multiple disciplines to guide feature selection. RESULTS: Know-GRRF embeds domain knowledge in a regularized random forest framework. It combines prior information from multiple domains in a linear model to derive a composite score, which, together with other tuning parameters, controls the regularization of the random forests model. Know-GRRF concurrently optimizes the weight given to each type of domain knowledge and other tuning parameters to minimize the AIC of out-of-bag predictions. The objective is to select a compact feature subset that has a high discriminative power and strong functional relevance to the biological phenotype. Via rigorous simulations, we show that Know-GRRF guided by multiple-domain prior information outperforms feature selection methods guided by single-domain prior information or no prior information. We then applied Known-GRRF to a real-world study to identify prognostic biomarkers of prostate cancers. We evaluated the combination of cancer-related gene annotations, evolutionary conservation and pre-computed statistical scores as the prior knowledge to assemble a panel of biomarkers. We discovered a compact set of biomarkers with significant improvements on prediction accuracies. CONCLUSIONS: Know-GRRF is a powerful novel method to incorporate knowledge from multiple domains for feature selection. It has a broad range of applications in biomarker discoveries. We implemented this method and released a KnowGRRF package in the R/CRAN archive. BioMed Central 2020-03-11 /pmc/articles/PMC7068914/ /pubmed/32164534 http://dx.doi.org/10.1186/s12859-020-3344-x Text en © The Author(s). 2020 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Guan, Xin Runger, George Liu, Li Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery
title	Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery
title_full	Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery
title_fullStr	Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery
title_full_unstemmed	Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery
title_short	Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery
title_sort	dynamic incorporation of prior knowledge from multiple domains in biomarker discovery
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7068914/ https://www.ncbi.nlm.nih.gov/pubmed/32164534 http://dx.doi.org/10.1186/s12859-020-3344-x
work_keys_str_mv	AT guanxin dynamicincorporationofpriorknowledgefrommultipledomainsinbiomarkerdiscovery AT rungergeorge dynamicincorporationofpriorknowledgefrommultipledomainsinbiomarkerdiscovery AT liuli dynamicincorporationofpriorknowledgefrommultipledomainsinbiomarkerdiscovery

Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery

Ejemplares similares