Cargando…
BLASSO: integration of biological knowledge into a regularized linear model
BACKGROUND: In RNA-Seq gene expression analysis, a genetic signature or biomarker is defined as a subset of genes that is probably involved in a given complex human trait and usually provide predictive capabilities for that trait. The discovery of new genetic signatures is challenging, as it entails...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245593/ https://www.ncbi.nlm.nih.gov/pubmed/30458775 http://dx.doi.org/10.1186/s12918-018-0612-8 |
_version_ | 1783372270880686080 |
---|---|
author | Urda, Daniel Aragón, Francisco Bautista, Rocío Franco, Leonardo Veredas, Francisco J. Claros, Manuel Gonzalo Jerez, José Manuel |
author_facet | Urda, Daniel Aragón, Francisco Bautista, Rocío Franco, Leonardo Veredas, Francisco J. Claros, Manuel Gonzalo Jerez, José Manuel |
author_sort | Urda, Daniel |
collection | PubMed |
description | BACKGROUND: In RNA-Seq gene expression analysis, a genetic signature or biomarker is defined as a subset of genes that is probably involved in a given complex human trait and usually provide predictive capabilities for that trait. The discovery of new genetic signatures is challenging, as it entails the analysis of complex-nature information encoded at gene level. Moreover, biomarkers selection becomes unstable, since high correlation among the thousands of genes included in each sample usually exists, thus obtaining very low overlapping rates between the genetic signatures proposed by different authors. In this sense, this paper proposes BLASSO, a simple and highly interpretable linear model with l(1)-regularization that incorporates prior biological knowledge to the prediction of breast cancer outcomes. Two different approaches to integrate biological knowledge in BLASSO, Gene-specific and Gene-disease, are proposed to test their predictive performance and biomarker stability on a public RNA-Seq gene expression dataset for breast cancer. The relevance of the genetic signature for the model is inspected by a functional analysis. RESULTS: BLASSO has been compared with a baseline LASSO model. Using 10-fold cross-validation with 100 repetitions for models’ assessment, average AUC values of 0.7 and 0.69 were obtained for the Gene-specific and the Gene-disease approaches, respectively. These efficacy rates outperform the average AUC of 0.65 obtained with the LASSO. With respect to the stability of the genetic signatures found, BLASSO outperformed the baseline model in terms of the robustness index (RI). The Gene-specific approach gave RI of 0.15±0.03, compared to RI of 0.09±0.03 given by LASSO, thus being 66% times more robust. The functional analysis performed to the genetic signature obtained with the Gene-disease approach showed a significant presence of genes related with cancer, as well as one gene (IFNK) and one pseudogene (PCNAP1) which a priori had not been described to be related with cancer. CONCLUSIONS: BLASSO has been shown as a good choice both in terms of predictive efficacy and biomarker stability, when compared to other similar approaches. Further functional analyses of the genetic signatures obtained with BLASSO has not only revealed genes with important roles in cancer, but also genes that should play an unknown or collateral role in the studied disease. |
format | Online Article Text |
id | pubmed-6245593 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-62455932018-11-26 BLASSO: integration of biological knowledge into a regularized linear model Urda, Daniel Aragón, Francisco Bautista, Rocío Franco, Leonardo Veredas, Francisco J. Claros, Manuel Gonzalo Jerez, José Manuel BMC Syst Biol Research BACKGROUND: In RNA-Seq gene expression analysis, a genetic signature or biomarker is defined as a subset of genes that is probably involved in a given complex human trait and usually provide predictive capabilities for that trait. The discovery of new genetic signatures is challenging, as it entails the analysis of complex-nature information encoded at gene level. Moreover, biomarkers selection becomes unstable, since high correlation among the thousands of genes included in each sample usually exists, thus obtaining very low overlapping rates between the genetic signatures proposed by different authors. In this sense, this paper proposes BLASSO, a simple and highly interpretable linear model with l(1)-regularization that incorporates prior biological knowledge to the prediction of breast cancer outcomes. Two different approaches to integrate biological knowledge in BLASSO, Gene-specific and Gene-disease, are proposed to test their predictive performance and biomarker stability on a public RNA-Seq gene expression dataset for breast cancer. The relevance of the genetic signature for the model is inspected by a functional analysis. RESULTS: BLASSO has been compared with a baseline LASSO model. Using 10-fold cross-validation with 100 repetitions for models’ assessment, average AUC values of 0.7 and 0.69 were obtained for the Gene-specific and the Gene-disease approaches, respectively. These efficacy rates outperform the average AUC of 0.65 obtained with the LASSO. With respect to the stability of the genetic signatures found, BLASSO outperformed the baseline model in terms of the robustness index (RI). The Gene-specific approach gave RI of 0.15±0.03, compared to RI of 0.09±0.03 given by LASSO, thus being 66% times more robust. The functional analysis performed to the genetic signature obtained with the Gene-disease approach showed a significant presence of genes related with cancer, as well as one gene (IFNK) and one pseudogene (PCNAP1) which a priori had not been described to be related with cancer. CONCLUSIONS: BLASSO has been shown as a good choice both in terms of predictive efficacy and biomarker stability, when compared to other similar approaches. Further functional analyses of the genetic signatures obtained with BLASSO has not only revealed genes with important roles in cancer, but also genes that should play an unknown or collateral role in the studied disease. BioMed Central 2018-11-20 /pmc/articles/PMC6245593/ /pubmed/30458775 http://dx.doi.org/10.1186/s12918-018-0612-8 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Urda, Daniel Aragón, Francisco Bautista, Rocío Franco, Leonardo Veredas, Francisco J. Claros, Manuel Gonzalo Jerez, José Manuel BLASSO: integration of biological knowledge into a regularized linear model |
title | BLASSO: integration of biological knowledge into a regularized linear model |
title_full | BLASSO: integration of biological knowledge into a regularized linear model |
title_fullStr | BLASSO: integration of biological knowledge into a regularized linear model |
title_full_unstemmed | BLASSO: integration of biological knowledge into a regularized linear model |
title_short | BLASSO: integration of biological knowledge into a regularized linear model |
title_sort | blasso: integration of biological knowledge into a regularized linear model |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245593/ https://www.ncbi.nlm.nih.gov/pubmed/30458775 http://dx.doi.org/10.1186/s12918-018-0612-8 |
work_keys_str_mv | AT urdadaniel blassointegrationofbiologicalknowledgeintoaregularizedlinearmodel AT aragonfrancisco blassointegrationofbiologicalknowledgeintoaregularizedlinearmodel AT bautistarocio blassointegrationofbiologicalknowledgeintoaregularizedlinearmodel AT francoleonardo blassointegrationofbiologicalknowledgeintoaregularizedlinearmodel AT veredasfranciscoj blassointegrationofbiologicalknowledgeintoaregularizedlinearmodel AT clarosmanuelgonzalo blassointegrationofbiologicalknowledgeintoaregularizedlinearmodel AT jerezjosemanuel blassointegrationofbiologicalknowledgeintoaregularizedlinearmodel |