Cargando…

Leveraging Bayesian networks and information theory to learn risk factors for breast cancer metastasis

BACKGROUND: Even though we have established a few risk factors for metastatic breast cancer (MBC) through epidemiologic studies, these risk factors have not proven to be effective in predicting an individual’s risk of developing metastasis. Therefore, identifying critical risk factors for MBC contin...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Xia, Wells, Alan, Brufsky, Adam, Shetty, Darshan, Shajihan, Kahmil, Neapolitan, Richard E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7350636/
https://www.ncbi.nlm.nih.gov/pubmed/32650714
http://dx.doi.org/10.1186/s12859-020-03638-8
_version_ 1783557306901856256
author Jiang, Xia
Wells, Alan
Brufsky, Adam
Shetty, Darshan
Shajihan, Kahmil
Neapolitan, Richard E.
author_facet Jiang, Xia
Wells, Alan
Brufsky, Adam
Shetty, Darshan
Shajihan, Kahmil
Neapolitan, Richard E.
author_sort Jiang, Xia
collection PubMed
description BACKGROUND: Even though we have established a few risk factors for metastatic breast cancer (MBC) through epidemiologic studies, these risk factors have not proven to be effective in predicting an individual’s risk of developing metastasis. Therefore, identifying critical risk factors for MBC continues to be a major research imperative, and one which can lead to advances in breast cancer clinical care. The objective of this research is to leverage Bayesian Networks (BN) and information theory to identify key risk factors for breast cancer metastasis from data. METHODS: We develop the Markov Blanket and Interactive risk factor Learner (MBIL) algorithm, which learns single and interactive risk factors having a direct influence on a patient’s outcome. We evaluate the effectiveness of MBIL using simulated datasets, and compare MBIL with the BN learning algorithms Fast Greedy Search (FGS), PC algorithm (PC), and CPC algorithm (CPC). We apply MBIL to learn risk factors for 5 year breast cancer metastasis using a clinical dataset we curated. We evaluate the learned risk factors by consulting with breast cancer experts and literature. We further evaluate the effectiveness of MBIL at learning risk factors for breast cancer metastasis by comparing it to the BN learning algorithms Necessary Path Condition (NPC) and Greedy Equivalent Search (GES). RESULTS: The averages of the Jaccard index for the simulated datasets containing 2000 records were 0.705, 0.272, 0.228, and 0.147 for MBIL, FGS, PC, and CPC respectively. MBIL, NPC, and GES all learned that grade and lymph_nodes_positive are direct risk factors for 5 year metastasis. Only MBIL and NPC found that surgical_margins is a direct risk factor. Only NPC found that invasive is a direct risk factor. MBIL learned that HER2 and ER interact to directly affect 5 year metastasis. Neither GES nor NPC learned that HER2 and ER are direct risk factors. DISCUSSION: The results involving simulated datasets indicated that MBIL can learn direct risk factors substantially better than standard Bayesian network learning algorithms. An application of MBIL to a real breast cancer dataset identified both single and interactive risk factors that directly influence breast cancer metastasis, which can be investigated further.
format Online
Article
Text
id pubmed-7350636
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-73506362020-07-14 Leveraging Bayesian networks and information theory to learn risk factors for breast cancer metastasis Jiang, Xia Wells, Alan Brufsky, Adam Shetty, Darshan Shajihan, Kahmil Neapolitan, Richard E. BMC Bioinformatics Methodology Article BACKGROUND: Even though we have established a few risk factors for metastatic breast cancer (MBC) through epidemiologic studies, these risk factors have not proven to be effective in predicting an individual’s risk of developing metastasis. Therefore, identifying critical risk factors for MBC continues to be a major research imperative, and one which can lead to advances in breast cancer clinical care. The objective of this research is to leverage Bayesian Networks (BN) and information theory to identify key risk factors for breast cancer metastasis from data. METHODS: We develop the Markov Blanket and Interactive risk factor Learner (MBIL) algorithm, which learns single and interactive risk factors having a direct influence on a patient’s outcome. We evaluate the effectiveness of MBIL using simulated datasets, and compare MBIL with the BN learning algorithms Fast Greedy Search (FGS), PC algorithm (PC), and CPC algorithm (CPC). We apply MBIL to learn risk factors for 5 year breast cancer metastasis using a clinical dataset we curated. We evaluate the learned risk factors by consulting with breast cancer experts and literature. We further evaluate the effectiveness of MBIL at learning risk factors for breast cancer metastasis by comparing it to the BN learning algorithms Necessary Path Condition (NPC) and Greedy Equivalent Search (GES). RESULTS: The averages of the Jaccard index for the simulated datasets containing 2000 records were 0.705, 0.272, 0.228, and 0.147 for MBIL, FGS, PC, and CPC respectively. MBIL, NPC, and GES all learned that grade and lymph_nodes_positive are direct risk factors for 5 year metastasis. Only MBIL and NPC found that surgical_margins is a direct risk factor. Only NPC found that invasive is a direct risk factor. MBIL learned that HER2 and ER interact to directly affect 5 year metastasis. Neither GES nor NPC learned that HER2 and ER are direct risk factors. DISCUSSION: The results involving simulated datasets indicated that MBIL can learn direct risk factors substantially better than standard Bayesian network learning algorithms. An application of MBIL to a real breast cancer dataset identified both single and interactive risk factors that directly influence breast cancer metastasis, which can be investigated further. BioMed Central 2020-07-10 /pmc/articles/PMC7350636/ /pubmed/32650714 http://dx.doi.org/10.1186/s12859-020-03638-8 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Jiang, Xia
Wells, Alan
Brufsky, Adam
Shetty, Darshan
Shajihan, Kahmil
Neapolitan, Richard E.
Leveraging Bayesian networks and information theory to learn risk factors for breast cancer metastasis
title Leveraging Bayesian networks and information theory to learn risk factors for breast cancer metastasis
title_full Leveraging Bayesian networks and information theory to learn risk factors for breast cancer metastasis
title_fullStr Leveraging Bayesian networks and information theory to learn risk factors for breast cancer metastasis
title_full_unstemmed Leveraging Bayesian networks and information theory to learn risk factors for breast cancer metastasis
title_short Leveraging Bayesian networks and information theory to learn risk factors for breast cancer metastasis
title_sort leveraging bayesian networks and information theory to learn risk factors for breast cancer metastasis
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7350636/
https://www.ncbi.nlm.nih.gov/pubmed/32650714
http://dx.doi.org/10.1186/s12859-020-03638-8
work_keys_str_mv AT jiangxia leveragingbayesiannetworksandinformationtheorytolearnriskfactorsforbreastcancermetastasis
AT wellsalan leveragingbayesiannetworksandinformationtheorytolearnriskfactorsforbreastcancermetastasis
AT brufskyadam leveragingbayesiannetworksandinformationtheorytolearnriskfactorsforbreastcancermetastasis
AT shettydarshan leveragingbayesiannetworksandinformationtheorytolearnriskfactorsforbreastcancermetastasis
AT shajihankahmil leveragingbayesiannetworksandinformationtheorytolearnriskfactorsforbreastcancermetastasis
AT neapolitanricharde leveragingbayesiannetworksandinformationtheorytolearnriskfactorsforbreastcancermetastasis