Cargando…

Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls

We propose an effective machine learning approach to identify group of interacting single nucleotide polymorphisms (SNPs), which contribute most to the breast cancer (BC) risk by assuming dependencies among BCAC iCOGS SNPs. We adopt a gradient tree boosting method followed by an adaptive iterative S...

Descripción completa

Detalles Bibliográficos
Autores principales: Behravan, Hamid, Hartikainen, Jaana M., Tengström, Maria, Pylkäs, Katri, Winqvist, Robert, Kosma, Veli–Matti, Mannermaa, Arto
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6120908/
https://www.ncbi.nlm.nih.gov/pubmed/30177847
http://dx.doi.org/10.1038/s41598-018-31573-5
_version_ 1783352347235188736
author Behravan, Hamid
Hartikainen, Jaana M.
Tengström, Maria
Pylkäs, Katri
Winqvist, Robert
Kosma, Veli–Matti
Mannermaa, Arto
author_facet Behravan, Hamid
Hartikainen, Jaana M.
Tengström, Maria
Pylkäs, Katri
Winqvist, Robert
Kosma, Veli–Matti
Mannermaa, Arto
author_sort Behravan, Hamid
collection PubMed
description We propose an effective machine learning approach to identify group of interacting single nucleotide polymorphisms (SNPs), which contribute most to the breast cancer (BC) risk by assuming dependencies among BCAC iCOGS SNPs. We adopt a gradient tree boosting method followed by an adaptive iterative SNP search to capture complex non-linear SNP-SNP interactions and consequently, obtain group of interacting SNPs with high BC risk-predictive potential. We also propose a support vector machine formed by the identified SNPs to classify BC cases and controls. Our approach achieves mean average precision (mAP) of 72.66, 67.24 and 69.25 in discriminating BC cases and controls in KBCP, OBCS and merged KBCP-OBCS sample sets, respectively. These results are better than the mAP of 70.08, 63.61 and 66.41 obtained by using a polygenic risk score model derived from 51 known BC-associated SNPs, respectively, in KBCP, OBCS and merged KBCP-OBCS sample sets. BC subtype analysis further reveals that the 200 identified KBCP SNPs from the proposed method performs favorably in classifying estrogen receptor positive (ER+) and negative (ER−) BC cases both in KBCP and OBCS data. Further, a biological analysis of the identified SNPs reveals genes related to important BC-related mechanisms, estrogen metabolism and apoptosis.
format Online
Article
Text
id pubmed-6120908
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-61209082018-09-06 Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls Behravan, Hamid Hartikainen, Jaana M. Tengström, Maria Pylkäs, Katri Winqvist, Robert Kosma, Veli–Matti Mannermaa, Arto Sci Rep Article We propose an effective machine learning approach to identify group of interacting single nucleotide polymorphisms (SNPs), which contribute most to the breast cancer (BC) risk by assuming dependencies among BCAC iCOGS SNPs. We adopt a gradient tree boosting method followed by an adaptive iterative SNP search to capture complex non-linear SNP-SNP interactions and consequently, obtain group of interacting SNPs with high BC risk-predictive potential. We also propose a support vector machine formed by the identified SNPs to classify BC cases and controls. Our approach achieves mean average precision (mAP) of 72.66, 67.24 and 69.25 in discriminating BC cases and controls in KBCP, OBCS and merged KBCP-OBCS sample sets, respectively. These results are better than the mAP of 70.08, 63.61 and 66.41 obtained by using a polygenic risk score model derived from 51 known BC-associated SNPs, respectively, in KBCP, OBCS and merged KBCP-OBCS sample sets. BC subtype analysis further reveals that the 200 identified KBCP SNPs from the proposed method performs favorably in classifying estrogen receptor positive (ER+) and negative (ER−) BC cases both in KBCP and OBCS data. Further, a biological analysis of the identified SNPs reveals genes related to important BC-related mechanisms, estrogen metabolism and apoptosis. Nature Publishing Group UK 2018-09-03 /pmc/articles/PMC6120908/ /pubmed/30177847 http://dx.doi.org/10.1038/s41598-018-31573-5 Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Behravan, Hamid
Hartikainen, Jaana M.
Tengström, Maria
Pylkäs, Katri
Winqvist, Robert
Kosma, Veli–Matti
Mannermaa, Arto
Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls
title Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls
title_full Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls
title_fullStr Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls
title_full_unstemmed Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls
title_short Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls
title_sort machine learning identifies interacting genetic variants contributing to breast cancer risk: a case study in finnish cases and controls
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6120908/
https://www.ncbi.nlm.nih.gov/pubmed/30177847
http://dx.doi.org/10.1038/s41598-018-31573-5
work_keys_str_mv AT behravanhamid machinelearningidentifiesinteractinggeneticvariantscontributingtobreastcancerriskacasestudyinfinnishcasesandcontrols
AT hartikainenjaanam machinelearningidentifiesinteractinggeneticvariantscontributingtobreastcancerriskacasestudyinfinnishcasesandcontrols
AT tengstrommaria machinelearningidentifiesinteractinggeneticvariantscontributingtobreastcancerriskacasestudyinfinnishcasesandcontrols
AT pylkaskatri machinelearningidentifiesinteractinggeneticvariantscontributingtobreastcancerriskacasestudyinfinnishcasesandcontrols
AT winqvistrobert machinelearningidentifiesinteractinggeneticvariantscontributingtobreastcancerriskacasestudyinfinnishcasesandcontrols
AT kosmavelimatti machinelearningidentifiesinteractinggeneticvariantscontributingtobreastcancerriskacasestudyinfinnishcasesandcontrols
AT mannermaaarto machinelearningidentifiesinteractinggeneticvariantscontributingtobreastcancerriskacasestudyinfinnishcasesandcontrols