Cargando…

Phenotype forecasting with SNPs data through gene-based Bayesian networks

BACKGROUND: Bayesian networks are powerful instruments to learn genetic models from association studies data. They are able to derive the existing correlation between genetic markers and phenotypic traits and, at the same time, to find the relationships between the markers themselves. However, learn...

Descripción completa

Detalles Bibliográficos
Autores principales: Malovini, Alberto, Nuzzo, Angelo, Ferrazzi, Fulvia, Puca, Annibale A, Bellazzi, Riccardo
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2646249/
https://www.ncbi.nlm.nih.gov/pubmed/19208195
http://dx.doi.org/10.1186/1471-2105-10-S2-S7
_version_ 1782164832302137344
author Malovini, Alberto
Nuzzo, Angelo
Ferrazzi, Fulvia
Puca, Annibale A
Bellazzi, Riccardo
author_facet Malovini, Alberto
Nuzzo, Angelo
Ferrazzi, Fulvia
Puca, Annibale A
Bellazzi, Riccardo
author_sort Malovini, Alberto
collection PubMed
description BACKGROUND: Bayesian networks are powerful instruments to learn genetic models from association studies data. They are able to derive the existing correlation between genetic markers and phenotypic traits and, at the same time, to find the relationships between the markers themselves. However, learning Bayesian networks is often non-trivial due to the high number of variables to be taken into account in the model with respect to the instances of the dataset. Therefore, it becomes very interesting to use an abstraction of the variable space that suitably reduces its dimensionality without losing information. In this paper we present a new strategy to achieve this goal by mapping the SNPs related to the same gene to one meta-variable. In order to assign states to the meta-variables we employ an approach based on classification trees. RESULTS: We applied our approach to data coming from a genome-wide scan on 288 individuals affected by arterial hypertension and 271 nonagenarians without history of hypertension. After pre-processing, we focused on a subset of 24 SNPs. We compared the performance of the proposed approach with the Bayesian network learned with SNPs as variables and with the network learned with haplotypes as meta-variables. The results were obtained by running a hold-out experiment five times. The mean accuracy of the new method was 64.28%, while the mean accuracy of the SNPs network was 58.99% and the mean accuracy of the haplotype network was 54.57%. CONCLUSION: The new approach presented in this paper is able to derive a gene-based predictive model based on SNPs data. Such model is more parsimonious than the one based on single SNPs, while preserving the capability of highlighting predictive SNPs configurations. The prediction performance of this approach was consistently superior to the SNP-based and the haplotype-based one in all the test sets of the evaluation procedure. The method can be then considered as an alternative way to analyze the data coming from association studies.
format Text
id pubmed-2646249
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26462492009-02-23 Phenotype forecasting with SNPs data through gene-based Bayesian networks Malovini, Alberto Nuzzo, Angelo Ferrazzi, Fulvia Puca, Annibale A Bellazzi, Riccardo BMC Bioinformatics Proceedings BACKGROUND: Bayesian networks are powerful instruments to learn genetic models from association studies data. They are able to derive the existing correlation between genetic markers and phenotypic traits and, at the same time, to find the relationships between the markers themselves. However, learning Bayesian networks is often non-trivial due to the high number of variables to be taken into account in the model with respect to the instances of the dataset. Therefore, it becomes very interesting to use an abstraction of the variable space that suitably reduces its dimensionality without losing information. In this paper we present a new strategy to achieve this goal by mapping the SNPs related to the same gene to one meta-variable. In order to assign states to the meta-variables we employ an approach based on classification trees. RESULTS: We applied our approach to data coming from a genome-wide scan on 288 individuals affected by arterial hypertension and 271 nonagenarians without history of hypertension. After pre-processing, we focused on a subset of 24 SNPs. We compared the performance of the proposed approach with the Bayesian network learned with SNPs as variables and with the network learned with haplotypes as meta-variables. The results were obtained by running a hold-out experiment five times. The mean accuracy of the new method was 64.28%, while the mean accuracy of the SNPs network was 58.99% and the mean accuracy of the haplotype network was 54.57%. CONCLUSION: The new approach presented in this paper is able to derive a gene-based predictive model based on SNPs data. Such model is more parsimonious than the one based on single SNPs, while preserving the capability of highlighting predictive SNPs configurations. The prediction performance of this approach was consistently superior to the SNP-based and the haplotype-based one in all the test sets of the evaluation procedure. The method can be then considered as an alternative way to analyze the data coming from association studies. BioMed Central 2009-02-05 /pmc/articles/PMC2646249/ /pubmed/19208195 http://dx.doi.org/10.1186/1471-2105-10-S2-S7 Text en Copyright © 2009 Malovini et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Malovini, Alberto
Nuzzo, Angelo
Ferrazzi, Fulvia
Puca, Annibale A
Bellazzi, Riccardo
Phenotype forecasting with SNPs data through gene-based Bayesian networks
title Phenotype forecasting with SNPs data through gene-based Bayesian networks
title_full Phenotype forecasting with SNPs data through gene-based Bayesian networks
title_fullStr Phenotype forecasting with SNPs data through gene-based Bayesian networks
title_full_unstemmed Phenotype forecasting with SNPs data through gene-based Bayesian networks
title_short Phenotype forecasting with SNPs data through gene-based Bayesian networks
title_sort phenotype forecasting with snps data through gene-based bayesian networks
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2646249/
https://www.ncbi.nlm.nih.gov/pubmed/19208195
http://dx.doi.org/10.1186/1471-2105-10-S2-S7
work_keys_str_mv AT malovinialberto phenotypeforecastingwithsnpsdatathroughgenebasedbayesiannetworks
AT nuzzoangelo phenotypeforecastingwithsnpsdatathroughgenebasedbayesiannetworks
AT ferrazzifulvia phenotypeforecastingwithsnpsdatathroughgenebasedbayesiannetworks
AT pucaannibalea phenotypeforecastingwithsnpsdatathroughgenebasedbayesiannetworks
AT bellazziriccardo phenotypeforecastingwithsnpsdatathroughgenebasedbayesiannetworks