Cargando…

Detecting gene-gene interactions using a permutation-based random forest method

BACKGROUND: Identifying gene-gene interactions is essential to understand disease susceptibility and to detect genetic architectures underlying complex diseases. Here, we aimed at developing a permutation-based methodology relying on a machine learning method, random forest (RF), to detect gene-gene...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Jing, Malley, James D., Andrew, Angeline S., Karagas, Margaret R., Moore, Jason H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4822295/
https://www.ncbi.nlm.nih.gov/pubmed/27053949
http://dx.doi.org/10.1186/s13040-016-0093-5
_version_ 1782425757917642752
author Li, Jing
Malley, James D.
Andrew, Angeline S.
Karagas, Margaret R.
Moore, Jason H.
author_facet Li, Jing
Malley, James D.
Andrew, Angeline S.
Karagas, Margaret R.
Moore, Jason H.
author_sort Li, Jing
collection PubMed
description BACKGROUND: Identifying gene-gene interactions is essential to understand disease susceptibility and to detect genetic architectures underlying complex diseases. Here, we aimed at developing a permutation-based methodology relying on a machine learning method, random forest (RF), to detect gene-gene interactions. Our approach called permuted random forest (pRF) which identified the top interacting single nucleotide polymorphism (SNP) pairs by estimating how much the power of a random forest classification model is influenced by removing pairwise interactions. RESULTS: We systematically tested our approach on a simulation study with datasets possessing various genetic constraints including heritability, number of SNPs, sample size, etc. Our methodology showed high success rates for detecting the interaction SNP pair. We also applied our approach to two bladder cancer datasets, which showed consistent results with well-studied methodologies, such as multifactor dimensionality reduction (MDR) and statistical epistasis network (SEN). Furthermore, we built permuted random forest networks (PRFN), in which we used nodes to represent SNPs and edges to indicate interactions. CONCLUSIONS: We successfully developed a scale-invariant methodology to detect pure gene-gene interactions based on permutation strategies and the machine learning method random forest. This methodology showed great potential to be used for detecting gene-gene interactions to study underlying genetic architectures in a scale-free way, which could be benefit to uncover the complex disease mechanisms.
format Online
Article
Text
id pubmed-4822295
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-48222952016-04-07 Detecting gene-gene interactions using a permutation-based random forest method Li, Jing Malley, James D. Andrew, Angeline S. Karagas, Margaret R. Moore, Jason H. BioData Min Methodology BACKGROUND: Identifying gene-gene interactions is essential to understand disease susceptibility and to detect genetic architectures underlying complex diseases. Here, we aimed at developing a permutation-based methodology relying on a machine learning method, random forest (RF), to detect gene-gene interactions. Our approach called permuted random forest (pRF) which identified the top interacting single nucleotide polymorphism (SNP) pairs by estimating how much the power of a random forest classification model is influenced by removing pairwise interactions. RESULTS: We systematically tested our approach on a simulation study with datasets possessing various genetic constraints including heritability, number of SNPs, sample size, etc. Our methodology showed high success rates for detecting the interaction SNP pair. We also applied our approach to two bladder cancer datasets, which showed consistent results with well-studied methodologies, such as multifactor dimensionality reduction (MDR) and statistical epistasis network (SEN). Furthermore, we built permuted random forest networks (PRFN), in which we used nodes to represent SNPs and edges to indicate interactions. CONCLUSIONS: We successfully developed a scale-invariant methodology to detect pure gene-gene interactions based on permutation strategies and the machine learning method random forest. This methodology showed great potential to be used for detecting gene-gene interactions to study underlying genetic architectures in a scale-free way, which could be benefit to uncover the complex disease mechanisms. BioMed Central 2016-04-06 /pmc/articles/PMC4822295/ /pubmed/27053949 http://dx.doi.org/10.1186/s13040-016-0093-5 Text en © Li et al. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Li, Jing
Malley, James D.
Andrew, Angeline S.
Karagas, Margaret R.
Moore, Jason H.
Detecting gene-gene interactions using a permutation-based random forest method
title Detecting gene-gene interactions using a permutation-based random forest method
title_full Detecting gene-gene interactions using a permutation-based random forest method
title_fullStr Detecting gene-gene interactions using a permutation-based random forest method
title_full_unstemmed Detecting gene-gene interactions using a permutation-based random forest method
title_short Detecting gene-gene interactions using a permutation-based random forest method
title_sort detecting gene-gene interactions using a permutation-based random forest method
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4822295/
https://www.ncbi.nlm.nih.gov/pubmed/27053949
http://dx.doi.org/10.1186/s13040-016-0093-5
work_keys_str_mv AT lijing detectinggenegeneinteractionsusingapermutationbasedrandomforestmethod
AT malleyjamesd detectinggenegeneinteractionsusingapermutationbasedrandomforestmethod
AT andrewangelines detectinggenegeneinteractionsusingapermutationbasedrandomforestmethod
AT karagasmargaretr detectinggenegeneinteractionsusingapermutationbasedrandomforestmethod
AT moorejasonh detectinggenegeneinteractionsusingapermutationbasedrandomforestmethod