Cargando…

Variable selection for disease progression models: methods for oncogenetic trees and application to cancer and HIV

BACKGROUND: Disease progression models are important for understanding the critical steps during the development of diseases. The models are imbedded in a statistical framework to deal with random variations due to biology and the sampling process when observing only a finite population. Conditional...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hainke, Katrin, Szugat, Sebastian, Fried, Roland, Rahnenführer, Jörg
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5539896/ https://www.ncbi.nlm.nih.gov/pubmed/28764644 http://dx.doi.org/10.1186/s12859-017-1762-1

_version_	1783254565223661568
author	Hainke, Katrin Szugat, Sebastian Fried, Roland Rahnenführer, Jörg
author_facet	Hainke, Katrin Szugat, Sebastian Fried, Roland Rahnenführer, Jörg
author_sort	Hainke, Katrin
collection	PubMed
description	BACKGROUND: Disease progression models are important for understanding the critical steps during the development of diseases. The models are imbedded in a statistical framework to deal with random variations due to biology and the sampling process when observing only a finite population. Conditional probabilities are used to describe dependencies between events that characterise the critical steps in the disease process. Many different model classes have been proposed in the literature, from simple path models to complex Bayesian networks. A popular and easy to understand but yet flexible model class are oncogenetic trees. These have been applied to describe the accumulation of genetic aberrations in cancer and HIV data. However, the number of potentially relevant aberrations is often by far larger than the maximal number of events that can be used for reliably estimating the progression models. Still, there are only a few approaches to variable selection, which have not yet been investigated in detail. RESULTS: We fill this gap and propose specifically for oncogenetic trees ten variable selection methods, some of these being completely new. We compare them in an extensive simulation study and on real data from cancer and HIV. It turns out that the preselection of events by clique identification algorithms performs best. Here, events are selected if they belong to the largest or the maximum weight subgraph in which all pairs of vertices are connected. CONCLUSIONS: The variable selection method of identifying cliques finds both the important frequent events and those related to disease pathways. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1762-1) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5539896
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-55398962017-08-03 Variable selection for disease progression models: methods for oncogenetic trees and application to cancer and HIV Hainke, Katrin Szugat, Sebastian Fried, Roland Rahnenführer, Jörg BMC Bioinformatics Research Article BACKGROUND: Disease progression models are important for understanding the critical steps during the development of diseases. The models are imbedded in a statistical framework to deal with random variations due to biology and the sampling process when observing only a finite population. Conditional probabilities are used to describe dependencies between events that characterise the critical steps in the disease process. Many different model classes have been proposed in the literature, from simple path models to complex Bayesian networks. A popular and easy to understand but yet flexible model class are oncogenetic trees. These have been applied to describe the accumulation of genetic aberrations in cancer and HIV data. However, the number of potentially relevant aberrations is often by far larger than the maximal number of events that can be used for reliably estimating the progression models. Still, there are only a few approaches to variable selection, which have not yet been investigated in detail. RESULTS: We fill this gap and propose specifically for oncogenetic trees ten variable selection methods, some of these being completely new. We compare them in an extensive simulation study and on real data from cancer and HIV. It turns out that the preselection of events by clique identification algorithms performs best. Here, events are selected if they belong to the largest or the maximum weight subgraph in which all pairs of vertices are connected. CONCLUSIONS: The variable selection method of identifying cliques finds both the important frequent events and those related to disease pathways. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1762-1) contains supplementary material, which is available to authorized users. BioMed Central 2017-08-01 /pmc/articles/PMC5539896/ /pubmed/28764644 http://dx.doi.org/10.1186/s12859-017-1762-1 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Hainke, Katrin Szugat, Sebastian Fried, Roland Rahnenführer, Jörg Variable selection for disease progression models: methods for oncogenetic trees and application to cancer and HIV
title	Variable selection for disease progression models: methods for oncogenetic trees and application to cancer and HIV
title_full	Variable selection for disease progression models: methods for oncogenetic trees and application to cancer and HIV
title_fullStr	Variable selection for disease progression models: methods for oncogenetic trees and application to cancer and HIV
title_full_unstemmed	Variable selection for disease progression models: methods for oncogenetic trees and application to cancer and HIV
title_short	Variable selection for disease progression models: methods for oncogenetic trees and application to cancer and HIV
title_sort	variable selection for disease progression models: methods for oncogenetic trees and application to cancer and hiv
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5539896/ https://www.ncbi.nlm.nih.gov/pubmed/28764644 http://dx.doi.org/10.1186/s12859-017-1762-1
work_keys_str_mv	AT hainkekatrin variableselectionfordiseaseprogressionmodelsmethodsforoncogenetictreesandapplicationtocancerandhiv AT szugatsebastian variableselectionfordiseaseprogressionmodelsmethodsforoncogenetictreesandapplicationtocancerandhiv AT friedroland variableselectionfordiseaseprogressionmodelsmethodsforoncogenetictreesandapplicationtocancerandhiv AT rahnenfuhrerjorg variableselectionfordiseaseprogressionmodelsmethodsforoncogenetictreesandapplicationtocancerandhiv

Variable selection for disease progression models: methods for oncogenetic trees and application to cancer and HIV

Ejemplares similares