Cargando…

BLASSO: integration of biological knowledge into a regularized linear model

BACKGROUND: In RNA-Seq gene expression analysis, a genetic signature or biomarker is defined as a subset of genes that is probably involved in a given complex human trait and usually provide predictive capabilities for that trait. The discovery of new genetic signatures is challenging, as it entails...

Descripción completa

Detalles Bibliográficos
Autores principales: Urda, Daniel, Aragón, Francisco, Bautista, Rocío, Franco, Leonardo, Veredas, Francisco J., Claros, Manuel Gonzalo, Jerez, José Manuel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245593/
https://www.ncbi.nlm.nih.gov/pubmed/30458775
http://dx.doi.org/10.1186/s12918-018-0612-8
_version_ 1783372270880686080
author Urda, Daniel
Aragón, Francisco
Bautista, Rocío
Franco, Leonardo
Veredas, Francisco J.
Claros, Manuel Gonzalo
Jerez, José Manuel
author_facet Urda, Daniel
Aragón, Francisco
Bautista, Rocío
Franco, Leonardo
Veredas, Francisco J.
Claros, Manuel Gonzalo
Jerez, José Manuel
author_sort Urda, Daniel
collection PubMed
description BACKGROUND: In RNA-Seq gene expression analysis, a genetic signature or biomarker is defined as a subset of genes that is probably involved in a given complex human trait and usually provide predictive capabilities for that trait. The discovery of new genetic signatures is challenging, as it entails the analysis of complex-nature information encoded at gene level. Moreover, biomarkers selection becomes unstable, since high correlation among the thousands of genes included in each sample usually exists, thus obtaining very low overlapping rates between the genetic signatures proposed by different authors. In this sense, this paper proposes BLASSO, a simple and highly interpretable linear model with l(1)-regularization that incorporates prior biological knowledge to the prediction of breast cancer outcomes. Two different approaches to integrate biological knowledge in BLASSO, Gene-specific and Gene-disease, are proposed to test their predictive performance and biomarker stability on a public RNA-Seq gene expression dataset for breast cancer. The relevance of the genetic signature for the model is inspected by a functional analysis. RESULTS: BLASSO has been compared with a baseline LASSO model. Using 10-fold cross-validation with 100 repetitions for models’ assessment, average AUC values of 0.7 and 0.69 were obtained for the Gene-specific and the Gene-disease approaches, respectively. These efficacy rates outperform the average AUC of 0.65 obtained with the LASSO. With respect to the stability of the genetic signatures found, BLASSO outperformed the baseline model in terms of the robustness index (RI). The Gene-specific approach gave RI of 0.15±0.03, compared to RI of 0.09±0.03 given by LASSO, thus being 66% times more robust. The functional analysis performed to the genetic signature obtained with the Gene-disease approach showed a significant presence of genes related with cancer, as well as one gene (IFNK) and one pseudogene (PCNAP1) which a priori had not been described to be related with cancer. CONCLUSIONS: BLASSO has been shown as a good choice both in terms of predictive efficacy and biomarker stability, when compared to other similar approaches. Further functional analyses of the genetic signatures obtained with BLASSO has not only revealed genes with important roles in cancer, but also genes that should play an unknown or collateral role in the studied disease.
format Online
Article
Text
id pubmed-6245593
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62455932018-11-26 BLASSO: integration of biological knowledge into a regularized linear model Urda, Daniel Aragón, Francisco Bautista, Rocío Franco, Leonardo Veredas, Francisco J. Claros, Manuel Gonzalo Jerez, José Manuel BMC Syst Biol Research BACKGROUND: In RNA-Seq gene expression analysis, a genetic signature or biomarker is defined as a subset of genes that is probably involved in a given complex human trait and usually provide predictive capabilities for that trait. The discovery of new genetic signatures is challenging, as it entails the analysis of complex-nature information encoded at gene level. Moreover, biomarkers selection becomes unstable, since high correlation among the thousands of genes included in each sample usually exists, thus obtaining very low overlapping rates between the genetic signatures proposed by different authors. In this sense, this paper proposes BLASSO, a simple and highly interpretable linear model with l(1)-regularization that incorporates prior biological knowledge to the prediction of breast cancer outcomes. Two different approaches to integrate biological knowledge in BLASSO, Gene-specific and Gene-disease, are proposed to test their predictive performance and biomarker stability on a public RNA-Seq gene expression dataset for breast cancer. The relevance of the genetic signature for the model is inspected by a functional analysis. RESULTS: BLASSO has been compared with a baseline LASSO model. Using 10-fold cross-validation with 100 repetitions for models’ assessment, average AUC values of 0.7 and 0.69 were obtained for the Gene-specific and the Gene-disease approaches, respectively. These efficacy rates outperform the average AUC of 0.65 obtained with the LASSO. With respect to the stability of the genetic signatures found, BLASSO outperformed the baseline model in terms of the robustness index (RI). The Gene-specific approach gave RI of 0.15±0.03, compared to RI of 0.09±0.03 given by LASSO, thus being 66% times more robust. The functional analysis performed to the genetic signature obtained with the Gene-disease approach showed a significant presence of genes related with cancer, as well as one gene (IFNK) and one pseudogene (PCNAP1) which a priori had not been described to be related with cancer. CONCLUSIONS: BLASSO has been shown as a good choice both in terms of predictive efficacy and biomarker stability, when compared to other similar approaches. Further functional analyses of the genetic signatures obtained with BLASSO has not only revealed genes with important roles in cancer, but also genes that should play an unknown or collateral role in the studied disease. BioMed Central 2018-11-20 /pmc/articles/PMC6245593/ /pubmed/30458775 http://dx.doi.org/10.1186/s12918-018-0612-8 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Urda, Daniel
Aragón, Francisco
Bautista, Rocío
Franco, Leonardo
Veredas, Francisco J.
Claros, Manuel Gonzalo
Jerez, José Manuel
BLASSO: integration of biological knowledge into a regularized linear model
title BLASSO: integration of biological knowledge into a regularized linear model
title_full BLASSO: integration of biological knowledge into a regularized linear model
title_fullStr BLASSO: integration of biological knowledge into a regularized linear model
title_full_unstemmed BLASSO: integration of biological knowledge into a regularized linear model
title_short BLASSO: integration of biological knowledge into a regularized linear model
title_sort blasso: integration of biological knowledge into a regularized linear model
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245593/
https://www.ncbi.nlm.nih.gov/pubmed/30458775
http://dx.doi.org/10.1186/s12918-018-0612-8
work_keys_str_mv AT urdadaniel blassointegrationofbiologicalknowledgeintoaregularizedlinearmodel
AT aragonfrancisco blassointegrationofbiologicalknowledgeintoaregularizedlinearmodel
AT bautistarocio blassointegrationofbiologicalknowledgeintoaregularizedlinearmodel
AT francoleonardo blassointegrationofbiologicalknowledgeintoaregularizedlinearmodel
AT veredasfranciscoj blassointegrationofbiologicalknowledgeintoaregularizedlinearmodel
AT clarosmanuelgonzalo blassointegrationofbiologicalknowledgeintoaregularizedlinearmodel
AT jerezjosemanuel blassointegrationofbiologicalknowledgeintoaregularizedlinearmodel