Cargando…

Analysis of epidemiological association patterns of serum thyrotropin by combining random forests and Bayesian networks

BACKGROUND: Approaching epidemiological data with flexible machine learning algorithms is of great value for understanding disease-specific association patterns. However, it can be difficult to correctly extract and understand those patterns due to the lack of model interpretability. METHOD: We here...

Descripción completa

Detalles Bibliográficos
Autores principales: Becker, Ann-Kristin, Ittermann, Till, Dörr, Markus, Felix, Stephan B., Nauck, Matthias, Teumer, Alexander, Völker, Uwe, Völzke, Henry, Kaderali, Lars, Nath, Neetika
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9302835/
https://www.ncbi.nlm.nih.gov/pubmed/35862421
http://dx.doi.org/10.1371/journal.pone.0271610
_version_ 1784751722747396096
author Becker, Ann-Kristin
Ittermann, Till
Dörr, Markus
Felix, Stephan B.
Nauck, Matthias
Teumer, Alexander
Völker, Uwe
Völzke, Henry
Kaderali, Lars
Nath, Neetika
author_facet Becker, Ann-Kristin
Ittermann, Till
Dörr, Markus
Felix, Stephan B.
Nauck, Matthias
Teumer, Alexander
Völker, Uwe
Völzke, Henry
Kaderali, Lars
Nath, Neetika
author_sort Becker, Ann-Kristin
collection PubMed
description BACKGROUND: Approaching epidemiological data with flexible machine learning algorithms is of great value for understanding disease-specific association patterns. However, it can be difficult to correctly extract and understand those patterns due to the lack of model interpretability. METHOD: We here propose a machine learning workflow that combines random forests with Bayesian network surrogate models to allow for a deeper level of interpretation of complex association patterns. We first evaluate the proposed workflow on synthetic data. We then apply it to data from the large population-based Study of Health in Pomerania (SHIP). Based on this combination, we discover and interpret broad patterns of individual serum TSH concentrations, an important marker of thyroid functionality. RESULTS: Evaluations using simulated data show that feature associations can be correctly recovered by combining random forests and Bayesian networks. The presented model achieves predictive accuracy that is similar to state-of-the-art models (root mean square error of 0.66, mean absolute error of 0.55, coefficient of determination of R(2) = 0.15). We identify 62 relevant features from the final random forest model, ranging from general health variables over dietary and genetic factors to physiological, hematological and hemostasis parameters. The Bayesian network model is used to put these features into context and make the black-box random forest model more understandable. CONCLUSION: We demonstrate that the combination of random forest and Bayesian network analysis is helpful to reveal and interpret broad association patterns of individual TSH concentrations. The discovered patterns are in line with state-of-the-art literature. They may be useful for future thyroid research and improved dosing of therapeutics.
format Online
Article
Text
id pubmed-9302835
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-93028352022-07-22 Analysis of epidemiological association patterns of serum thyrotropin by combining random forests and Bayesian networks Becker, Ann-Kristin Ittermann, Till Dörr, Markus Felix, Stephan B. Nauck, Matthias Teumer, Alexander Völker, Uwe Völzke, Henry Kaderali, Lars Nath, Neetika PLoS One Research Article BACKGROUND: Approaching epidemiological data with flexible machine learning algorithms is of great value for understanding disease-specific association patterns. However, it can be difficult to correctly extract and understand those patterns due to the lack of model interpretability. METHOD: We here propose a machine learning workflow that combines random forests with Bayesian network surrogate models to allow for a deeper level of interpretation of complex association patterns. We first evaluate the proposed workflow on synthetic data. We then apply it to data from the large population-based Study of Health in Pomerania (SHIP). Based on this combination, we discover and interpret broad patterns of individual serum TSH concentrations, an important marker of thyroid functionality. RESULTS: Evaluations using simulated data show that feature associations can be correctly recovered by combining random forests and Bayesian networks. The presented model achieves predictive accuracy that is similar to state-of-the-art models (root mean square error of 0.66, mean absolute error of 0.55, coefficient of determination of R(2) = 0.15). We identify 62 relevant features from the final random forest model, ranging from general health variables over dietary and genetic factors to physiological, hematological and hemostasis parameters. The Bayesian network model is used to put these features into context and make the black-box random forest model more understandable. CONCLUSION: We demonstrate that the combination of random forest and Bayesian network analysis is helpful to reveal and interpret broad association patterns of individual TSH concentrations. The discovered patterns are in line with state-of-the-art literature. They may be useful for future thyroid research and improved dosing of therapeutics. Public Library of Science 2022-07-21 /pmc/articles/PMC9302835/ /pubmed/35862421 http://dx.doi.org/10.1371/journal.pone.0271610 Text en © 2022 Becker et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Becker, Ann-Kristin
Ittermann, Till
Dörr, Markus
Felix, Stephan B.
Nauck, Matthias
Teumer, Alexander
Völker, Uwe
Völzke, Henry
Kaderali, Lars
Nath, Neetika
Analysis of epidemiological association patterns of serum thyrotropin by combining random forests and Bayesian networks
title Analysis of epidemiological association patterns of serum thyrotropin by combining random forests and Bayesian networks
title_full Analysis of epidemiological association patterns of serum thyrotropin by combining random forests and Bayesian networks
title_fullStr Analysis of epidemiological association patterns of serum thyrotropin by combining random forests and Bayesian networks
title_full_unstemmed Analysis of epidemiological association patterns of serum thyrotropin by combining random forests and Bayesian networks
title_short Analysis of epidemiological association patterns of serum thyrotropin by combining random forests and Bayesian networks
title_sort analysis of epidemiological association patterns of serum thyrotropin by combining random forests and bayesian networks
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9302835/
https://www.ncbi.nlm.nih.gov/pubmed/35862421
http://dx.doi.org/10.1371/journal.pone.0271610
work_keys_str_mv AT beckerannkristin analysisofepidemiologicalassociationpatternsofserumthyrotropinbycombiningrandomforestsandbayesiannetworks
AT ittermanntill analysisofepidemiologicalassociationpatternsofserumthyrotropinbycombiningrandomforestsandbayesiannetworks
AT dorrmarkus analysisofepidemiologicalassociationpatternsofserumthyrotropinbycombiningrandomforestsandbayesiannetworks
AT felixstephanb analysisofepidemiologicalassociationpatternsofserumthyrotropinbycombiningrandomforestsandbayesiannetworks
AT nauckmatthias analysisofepidemiologicalassociationpatternsofserumthyrotropinbycombiningrandomforestsandbayesiannetworks
AT teumeralexander analysisofepidemiologicalassociationpatternsofserumthyrotropinbycombiningrandomforestsandbayesiannetworks
AT volkeruwe analysisofepidemiologicalassociationpatternsofserumthyrotropinbycombiningrandomforestsandbayesiannetworks
AT volzkehenry analysisofepidemiologicalassociationpatternsofserumthyrotropinbycombiningrandomforestsandbayesiannetworks
AT kaderalilars analysisofepidemiologicalassociationpatternsofserumthyrotropinbycombiningrandomforestsandbayesiannetworks
AT nathneetika analysisofepidemiologicalassociationpatternsofserumthyrotropinbycombiningrandomforestsandbayesiannetworks