Cargando…

Tilting the lasso by knowledge-based post-processing

BACKGROUND: It is useful to incorporate biological knowledge on the role of genetic determinants in predicting an outcome. It is, however, not always feasible to fully elicit this information when the number of determinants is large. We present an approach to overcome this difficulty. First, using h...

Descripción completa

Detalles Bibliográficos
Autores principales: Tharmaratnam, Kukatharmini, Sperrin, Matthew, Jaki, Thomas, Reppe, Sjur, Frigessi, Arnoldo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5010709/
https://www.ncbi.nlm.nih.gov/pubmed/27590269
http://dx.doi.org/10.1186/s12859-016-1210-7
_version_ 1782451722249043968
author Tharmaratnam, Kukatharmini
Sperrin, Matthew
Jaki, Thomas
Reppe, Sjur
Frigessi, Arnoldo
author_facet Tharmaratnam, Kukatharmini
Sperrin, Matthew
Jaki, Thomas
Reppe, Sjur
Frigessi, Arnoldo
author_sort Tharmaratnam, Kukatharmini
collection PubMed
description BACKGROUND: It is useful to incorporate biological knowledge on the role of genetic determinants in predicting an outcome. It is, however, not always feasible to fully elicit this information when the number of determinants is large. We present an approach to overcome this difficulty. First, using half of the available data, a shortlist of potentially interesting determinants are generated. Second, binary indications of biological importance are elicited for this much smaller number of determinants. Third, an analysis is carried out on this shortlist using the second half of the data. RESULTS: We show through simulations that, compared with adaptive lasso, this approach leads to models containing more biologically relevant variables, while the prediction mean squared error (PMSE) is comparable or even reduced. We also apply our approach to bone mineral density data, and again final models contain more biologically relevant variables and have reduced PMSEs. CONCLUSION: Our method leads to comparable or improved predictive performance, and models with greater face validity and interpretability with feasible incorporation of biological knowledge into predictive models. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1210-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5010709
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50107092016-09-15 Tilting the lasso by knowledge-based post-processing Tharmaratnam, Kukatharmini Sperrin, Matthew Jaki, Thomas Reppe, Sjur Frigessi, Arnoldo BMC Bioinformatics Methodology Article BACKGROUND: It is useful to incorporate biological knowledge on the role of genetic determinants in predicting an outcome. It is, however, not always feasible to fully elicit this information when the number of determinants is large. We present an approach to overcome this difficulty. First, using half of the available data, a shortlist of potentially interesting determinants are generated. Second, binary indications of biological importance are elicited for this much smaller number of determinants. Third, an analysis is carried out on this shortlist using the second half of the data. RESULTS: We show through simulations that, compared with adaptive lasso, this approach leads to models containing more biologically relevant variables, while the prediction mean squared error (PMSE) is comparable or even reduced. We also apply our approach to bone mineral density data, and again final models contain more biologically relevant variables and have reduced PMSEs. CONCLUSION: Our method leads to comparable or improved predictive performance, and models with greater face validity and interpretability with feasible incorporation of biological knowledge into predictive models. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1210-7) contains supplementary material, which is available to authorized users. BioMed Central 2016-09-02 /pmc/articles/PMC5010709/ /pubmed/27590269 http://dx.doi.org/10.1186/s12859-016-1210-7 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Tharmaratnam, Kukatharmini
Sperrin, Matthew
Jaki, Thomas
Reppe, Sjur
Frigessi, Arnoldo
Tilting the lasso by knowledge-based post-processing
title Tilting the lasso by knowledge-based post-processing
title_full Tilting the lasso by knowledge-based post-processing
title_fullStr Tilting the lasso by knowledge-based post-processing
title_full_unstemmed Tilting the lasso by knowledge-based post-processing
title_short Tilting the lasso by knowledge-based post-processing
title_sort tilting the lasso by knowledge-based post-processing
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5010709/
https://www.ncbi.nlm.nih.gov/pubmed/27590269
http://dx.doi.org/10.1186/s12859-016-1210-7
work_keys_str_mv AT tharmaratnamkukatharmini tiltingthelassobyknowledgebasedpostprocessing
AT sperrinmatthew tiltingthelassobyknowledgebasedpostprocessing
AT jakithomas tiltingthelassobyknowledgebasedpostprocessing
AT reppesjur tiltingthelassobyknowledgebasedpostprocessing
AT frigessiarnoldo tiltingthelassobyknowledgebasedpostprocessing