Cargando…

Ensemble learning for detecting gene-gene interactions in colorectal cancer

Colorectal cancer (CRC) has a high incident rate in both men and women and is affecting millions of people every year. Genome-wide association studies (GWAS) on CRC have successfully revealed common single-nucleotide polymorphisms (SNPs) associated with CRC risk. However, they can only explain a ver...

Descripción completa

Detalles Bibliográficos
Autores principales: Dorani, Faramarz, Hu, Ting, Woods, Michael O., Zhai, Guangju
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6211269/
https://www.ncbi.nlm.nih.gov/pubmed/30397551
http://dx.doi.org/10.7717/peerj.5854
_version_ 1783367300631494656
author Dorani, Faramarz
Hu, Ting
Woods, Michael O.
Zhai, Guangju
author_facet Dorani, Faramarz
Hu, Ting
Woods, Michael O.
Zhai, Guangju
author_sort Dorani, Faramarz
collection PubMed
description Colorectal cancer (CRC) has a high incident rate in both men and women and is affecting millions of people every year. Genome-wide association studies (GWAS) on CRC have successfully revealed common single-nucleotide polymorphisms (SNPs) associated with CRC risk. However, they can only explain a very limited fraction of the disease heritability. One reason may be the common uni-variable analyses in GWAS where genetic variants are examined one at a time. Given the complexity of cancers, the non-additive interaction effects among multiple genetic variants have a potential of explaining the missing heritability. In this study, we employed two powerful ensemble learning algorithms, random forests and gradient boosting machine (GBM), to search for SNPs that contribute to the disease risk through non-additive gene-gene interactions. We were able to find 44 possible susceptibility SNPs that were ranked most significant by both algorithms. Out of those 44 SNPs, 29 are in coding regions. The 29 genes include ARRDC5, DCC, ALK, and ITGA1, which have been found previously associated with CRC, and E2F3 and NID2, which are potentially related to CRC since they have known associations with other types of cancer. We performed pairwise and three-way interaction analysis on the 44 SNPs using information theoretical techniques and found 17 pairwise (p < 0.02) and 16 three-way (p ≤ 0.001) interactions among them. Moreover, functional enrichment analysis suggested 16 functional terms or biological pathways that may help us better understand the etiology of the disease.
format Online
Article
Text
id pubmed-6211269
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-62112692018-11-05 Ensemble learning for detecting gene-gene interactions in colorectal cancer Dorani, Faramarz Hu, Ting Woods, Michael O. Zhai, Guangju PeerJ Bioinformatics Colorectal cancer (CRC) has a high incident rate in both men and women and is affecting millions of people every year. Genome-wide association studies (GWAS) on CRC have successfully revealed common single-nucleotide polymorphisms (SNPs) associated with CRC risk. However, they can only explain a very limited fraction of the disease heritability. One reason may be the common uni-variable analyses in GWAS where genetic variants are examined one at a time. Given the complexity of cancers, the non-additive interaction effects among multiple genetic variants have a potential of explaining the missing heritability. In this study, we employed two powerful ensemble learning algorithms, random forests and gradient boosting machine (GBM), to search for SNPs that contribute to the disease risk through non-additive gene-gene interactions. We were able to find 44 possible susceptibility SNPs that were ranked most significant by both algorithms. Out of those 44 SNPs, 29 are in coding regions. The 29 genes include ARRDC5, DCC, ALK, and ITGA1, which have been found previously associated with CRC, and E2F3 and NID2, which are potentially related to CRC since they have known associations with other types of cancer. We performed pairwise and three-way interaction analysis on the 44 SNPs using information theoretical techniques and found 17 pairwise (p < 0.02) and 16 three-way (p ≤ 0.001) interactions among them. Moreover, functional enrichment analysis suggested 16 functional terms or biological pathways that may help us better understand the etiology of the disease. PeerJ Inc. 2018-10-29 /pmc/articles/PMC6211269/ /pubmed/30397551 http://dx.doi.org/10.7717/peerj.5854 Text en © 2018 Dorani et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Dorani, Faramarz
Hu, Ting
Woods, Michael O.
Zhai, Guangju
Ensemble learning for detecting gene-gene interactions in colorectal cancer
title Ensemble learning for detecting gene-gene interactions in colorectal cancer
title_full Ensemble learning for detecting gene-gene interactions in colorectal cancer
title_fullStr Ensemble learning for detecting gene-gene interactions in colorectal cancer
title_full_unstemmed Ensemble learning for detecting gene-gene interactions in colorectal cancer
title_short Ensemble learning for detecting gene-gene interactions in colorectal cancer
title_sort ensemble learning for detecting gene-gene interactions in colorectal cancer
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6211269/
https://www.ncbi.nlm.nih.gov/pubmed/30397551
http://dx.doi.org/10.7717/peerj.5854
work_keys_str_mv AT doranifaramarz ensemblelearningfordetectinggenegeneinteractionsincolorectalcancer
AT huting ensemblelearningfordetectinggenegeneinteractionsincolorectalcancer
AT woodsmichaelo ensemblelearningfordetectinggenegeneinteractionsincolorectalcancer
AT zhaiguangju ensemblelearningfordetectinggenegeneinteractionsincolorectalcancer