Cargando…

Gene-gene interaction filtering with ensemble of filters

BACKGROUND: Complex diseases are commonly caused by multiple genes and their interactions with each other. Genome-wide association (GWA) studies provide us the opportunity to capture those disease associated genes and gene-gene interactions through panels of SNP markers. However, a proper filtering...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Pengyi, Ho, Joshua WK, Yang, Yee Hwa, Zhou, Bing B
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3044264/
https://www.ncbi.nlm.nih.gov/pubmed/21342539
http://dx.doi.org/10.1186/1471-2105-12-S1-S10
_version_ 1782198705933254656
author Yang, Pengyi
Ho, Joshua WK
Yang, Yee Hwa
Zhou, Bing B
author_facet Yang, Pengyi
Ho, Joshua WK
Yang, Yee Hwa
Zhou, Bing B
author_sort Yang, Pengyi
collection PubMed
description BACKGROUND: Complex diseases are commonly caused by multiple genes and their interactions with each other. Genome-wide association (GWA) studies provide us the opportunity to capture those disease associated genes and gene-gene interactions through panels of SNP markers. However, a proper filtering procedure is critical to reduce the search space prior to the computationally intensive gene-gene interaction identification step. In this study, we show that two commonly used SNP-SNP interaction filtering algorithms, ReliefF and tuned ReliefF (TuRF), are sensitive to the order of the samples in the dataset, giving rise to unstable and suboptimal results. However, we observe that the ‘unstable’ results from multiple runs of these algorithms can provide valuable information about the dataset. We therefore hypothesize that aggregating results from multiple runs of the algorithm may improve the filtering performance. RESULTS: We propose a simple and effective ensemble approach in which the results from multiple runs of an unstable filter are aggregated based on the general theory of ensemble learning. The ensemble versions of the ReliefF and TuRF algorithms, referred to as ReliefF-E and TuRF-E, are robust to sample order dependency and enable a more informative investigation of data characteristics. Using simulated and real datasets, we demonstrate that both the ensemble of ReliefF and the ensemble of TuRF can generate a much more stable SNP ranking than the original algorithms. Furthermore, the ensemble of TuRF achieved the highest success rate in comparison to many state-of-the-art algorithms as well as traditional χ(2)-test and odds ratio methods in terms of retaining gene-gene interactions.
format Text
id pubmed-3044264
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30442642011-02-25 Gene-gene interaction filtering with ensemble of filters Yang, Pengyi Ho, Joshua WK Yang, Yee Hwa Zhou, Bing B BMC Bioinformatics Research BACKGROUND: Complex diseases are commonly caused by multiple genes and their interactions with each other. Genome-wide association (GWA) studies provide us the opportunity to capture those disease associated genes and gene-gene interactions through panels of SNP markers. However, a proper filtering procedure is critical to reduce the search space prior to the computationally intensive gene-gene interaction identification step. In this study, we show that two commonly used SNP-SNP interaction filtering algorithms, ReliefF and tuned ReliefF (TuRF), are sensitive to the order of the samples in the dataset, giving rise to unstable and suboptimal results. However, we observe that the ‘unstable’ results from multiple runs of these algorithms can provide valuable information about the dataset. We therefore hypothesize that aggregating results from multiple runs of the algorithm may improve the filtering performance. RESULTS: We propose a simple and effective ensemble approach in which the results from multiple runs of an unstable filter are aggregated based on the general theory of ensemble learning. The ensemble versions of the ReliefF and TuRF algorithms, referred to as ReliefF-E and TuRF-E, are robust to sample order dependency and enable a more informative investigation of data characteristics. Using simulated and real datasets, we demonstrate that both the ensemble of ReliefF and the ensemble of TuRF can generate a much more stable SNP ranking than the original algorithms. Furthermore, the ensemble of TuRF achieved the highest success rate in comparison to many state-of-the-art algorithms as well as traditional χ(2)-test and odds ratio methods in terms of retaining gene-gene interactions. BioMed Central 2011-02-15 /pmc/articles/PMC3044264/ /pubmed/21342539 http://dx.doi.org/10.1186/1471-2105-12-S1-S10 Text en Copyright ©2011 Yang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Yang, Pengyi
Ho, Joshua WK
Yang, Yee Hwa
Zhou, Bing B
Gene-gene interaction filtering with ensemble of filters
title Gene-gene interaction filtering with ensemble of filters
title_full Gene-gene interaction filtering with ensemble of filters
title_fullStr Gene-gene interaction filtering with ensemble of filters
title_full_unstemmed Gene-gene interaction filtering with ensemble of filters
title_short Gene-gene interaction filtering with ensemble of filters
title_sort gene-gene interaction filtering with ensemble of filters
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3044264/
https://www.ncbi.nlm.nih.gov/pubmed/21342539
http://dx.doi.org/10.1186/1471-2105-12-S1-S10
work_keys_str_mv AT yangpengyi genegeneinteractionfilteringwithensembleoffilters
AT hojoshuawk genegeneinteractionfilteringwithensembleoffilters
AT yangyeehwa genegeneinteractionfilteringwithensembleoffilters
AT zhoubingb genegeneinteractionfilteringwithensembleoffilters