Cargando…

Learning genetic epistasis using Bayesian network scoring criteria

BACKGROUND: Gene-gene epistatic interactions likely play an important role in the genetic basis of many common diseases. Recently, machine-learning and data mining methods have been developed for learning epistatic relationships from data. A well-known combinatorial method that has been successfully...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jiang, Xia, Neapolitan, Richard E, Barmada, M Michael, Visweswaran, Shyam
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3080825/ https://www.ncbi.nlm.nih.gov/pubmed/21453508 http://dx.doi.org/10.1186/1471-2105-12-89

_version_	1782202146276507648
author	Jiang, Xia Neapolitan, Richard E Barmada, M Michael Visweswaran, Shyam
author_facet	Jiang, Xia Neapolitan, Richard E Barmada, M Michael Visweswaran, Shyam
author_sort	Jiang, Xia
collection	PubMed
description	BACKGROUND: Gene-gene epistatic interactions likely play an important role in the genetic basis of many common diseases. Recently, machine-learning and data mining methods have been developed for learning epistatic relationships from data. A well-known combinatorial method that has been successfully applied for detecting epistasis is Multifactor Dimensionality Reduction (MDR). Jiang et al. created a combinatorial epistasis learning method called BNMBL to learn Bayesian network (BN) epistatic models. They compared BNMBL to MDR using simulated data sets. Each of these data sets was generated from a model that associates two SNPs with a disease and includes 18 unrelated SNPs. For each data set, BNMBL and MDR were used to score all 2-SNP models, and BNMBL learned significantly more correct models. In real data sets, we ordinarily do not know the number of SNPs that influence phenotype. BNMBL may not perform as well if we also scored models containing more than two SNPs. Furthermore, a number of other BN scoring criteria have been developed. They may detect epistatic interactions even better than BNMBL. Although BNs are a promising tool for learning epistatic relationships from data, we cannot confidently use them in this domain until we determine which scoring criteria work best or even well when we try learning the correct model without knowledge of the number of SNPs in that model. RESULTS: We evaluated the performance of 22 BN scoring criteria using 28,000 simulated data sets and a real Alzheimer's GWAS data set. Our results were surprising in that the Bayesian scoring criterion with large values of a hyperparameter called α performed best. This score performed better than other BN scoring criteria and MDR at recall using simulated data sets, at detecting the hardest-to-detect models using simulated data sets, and at substantiating previous results using the real Alzheimer's data set. CONCLUSIONS: We conclude that representing epistatic interactions using BN models and scoring them using a BN scoring criterion holds promise for identifying epistatic genetic variants in data. In particular, the Bayesian scoring criterion with large values of a hyperparameter α appears more promising than a number of alternatives.
format	Text
id	pubmed-3080825
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-30808252011-04-22 Learning genetic epistasis using Bayesian network scoring criteria Jiang, Xia Neapolitan, Richard E Barmada, M Michael Visweswaran, Shyam BMC Bioinformatics Methodology Article BACKGROUND: Gene-gene epistatic interactions likely play an important role in the genetic basis of many common diseases. Recently, machine-learning and data mining methods have been developed for learning epistatic relationships from data. A well-known combinatorial method that has been successfully applied for detecting epistasis is Multifactor Dimensionality Reduction (MDR). Jiang et al. created a combinatorial epistasis learning method called BNMBL to learn Bayesian network (BN) epistatic models. They compared BNMBL to MDR using simulated data sets. Each of these data sets was generated from a model that associates two SNPs with a disease and includes 18 unrelated SNPs. For each data set, BNMBL and MDR were used to score all 2-SNP models, and BNMBL learned significantly more correct models. In real data sets, we ordinarily do not know the number of SNPs that influence phenotype. BNMBL may not perform as well if we also scored models containing more than two SNPs. Furthermore, a number of other BN scoring criteria have been developed. They may detect epistatic interactions even better than BNMBL. Although BNs are a promising tool for learning epistatic relationships from data, we cannot confidently use them in this domain until we determine which scoring criteria work best or even well when we try learning the correct model without knowledge of the number of SNPs in that model. RESULTS: We evaluated the performance of 22 BN scoring criteria using 28,000 simulated data sets and a real Alzheimer's GWAS data set. Our results were surprising in that the Bayesian scoring criterion with large values of a hyperparameter called α performed best. This score performed better than other BN scoring criteria and MDR at recall using simulated data sets, at detecting the hardest-to-detect models using simulated data sets, and at substantiating previous results using the real Alzheimer's data set. CONCLUSIONS: We conclude that representing epistatic interactions using BN models and scoring them using a BN scoring criterion holds promise for identifying epistatic genetic variants in data. In particular, the Bayesian scoring criterion with large values of a hyperparameter α appears more promising than a number of alternatives. BioMed Central 2011-03-31 /pmc/articles/PMC3080825/ /pubmed/21453508 http://dx.doi.org/10.1186/1471-2105-12-89 Text en Copyright ©2011 Jiang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Jiang, Xia Neapolitan, Richard E Barmada, M Michael Visweswaran, Shyam Learning genetic epistasis using Bayesian network scoring criteria
title	Learning genetic epistasis using Bayesian network scoring criteria
title_full	Learning genetic epistasis using Bayesian network scoring criteria
title_fullStr	Learning genetic epistasis using Bayesian network scoring criteria
title_full_unstemmed	Learning genetic epistasis using Bayesian network scoring criteria
title_short	Learning genetic epistasis using Bayesian network scoring criteria
title_sort	learning genetic epistasis using bayesian network scoring criteria
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3080825/ https://www.ncbi.nlm.nih.gov/pubmed/21453508 http://dx.doi.org/10.1186/1471-2105-12-89
work_keys_str_mv	AT jiangxia learninggeneticepistasisusingbayesiannetworkscoringcriteria AT neapolitanricharde learninggeneticepistasisusingbayesiannetworkscoringcriteria AT barmadammichael learninggeneticepistasisusingbayesiannetworkscoringcriteria AT visweswaranshyam learninggeneticepistasisusingbayesiannetworkscoringcriteria

Learning genetic epistasis using Bayesian network scoring criteria

Ejemplares similares