Cargando…

A Bayesian network approach incorporating imputation of missing data enables exploratory analysis of complex causal biological relationships

Bayesian networks can be used to identify possible causal relationships between variables based on their conditional dependencies and independencies, which can be particularly useful in complex biological scenarios with many measured variables. Here we propose two improvements to an existing method...

Descripción completa

Detalles Bibliográficos
Autores principales: Howey, Richard, Clark, Alexander D., Naamane, Najib, Reynard, Louise N., Pratt, Arthur G., Cordell, Heather J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8504979/
https://www.ncbi.nlm.nih.gov/pubmed/34587167
http://dx.doi.org/10.1371/journal.pgen.1009811
_version_ 1784581430885482496
author Howey, Richard
Clark, Alexander D.
Naamane, Najib
Reynard, Louise N.
Pratt, Arthur G.
Cordell, Heather J.
author_facet Howey, Richard
Clark, Alexander D.
Naamane, Najib
Reynard, Louise N.
Pratt, Arthur G.
Cordell, Heather J.
author_sort Howey, Richard
collection PubMed
description Bayesian networks can be used to identify possible causal relationships between variables based on their conditional dependencies and independencies, which can be particularly useful in complex biological scenarios with many measured variables. Here we propose two improvements to an existing method for Bayesian network analysis, designed to increase the power to detect potential causal relationships between variables (including potentially a mixture of both discrete and continuous variables). Our first improvement relates to the treatment of missing data. When there is missing data, the standard approach is to remove every individual with any missing data before performing analysis. This can be wasteful and undesirable when there are many individuals with missing data, perhaps with only one or a few variables missing. This motivates the use of imputation. We present a new imputation method that uses a version of nearest neighbour imputation, whereby missing data from one individual is replaced with data from another individual, their nearest neighbour. For each individual with missing data, the subsets of variables to be used to select the nearest neighbour are chosen by sampling without replacement the complete data and estimating a best fit Bayesian network. We show that this approach leads to marked improvements in the recall and precision of directed edges in the final network identified, and we illustrate the approach through application to data from a recent study investigating the causal relationship between methylation and gene expression in early inflammatory arthritis patients. We also describe a second improvement in the form of a pseudo-Bayesian approach for upweighting certain network edges, which can be useful when there is prior evidence concerning their directions.
format Online
Article
Text
id pubmed-8504979
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-85049792021-10-12 A Bayesian network approach incorporating imputation of missing data enables exploratory analysis of complex causal biological relationships Howey, Richard Clark, Alexander D. Naamane, Najib Reynard, Louise N. Pratt, Arthur G. Cordell, Heather J. PLoS Genet Research Article Bayesian networks can be used to identify possible causal relationships between variables based on their conditional dependencies and independencies, which can be particularly useful in complex biological scenarios with many measured variables. Here we propose two improvements to an existing method for Bayesian network analysis, designed to increase the power to detect potential causal relationships between variables (including potentially a mixture of both discrete and continuous variables). Our first improvement relates to the treatment of missing data. When there is missing data, the standard approach is to remove every individual with any missing data before performing analysis. This can be wasteful and undesirable when there are many individuals with missing data, perhaps with only one or a few variables missing. This motivates the use of imputation. We present a new imputation method that uses a version of nearest neighbour imputation, whereby missing data from one individual is replaced with data from another individual, their nearest neighbour. For each individual with missing data, the subsets of variables to be used to select the nearest neighbour are chosen by sampling without replacement the complete data and estimating a best fit Bayesian network. We show that this approach leads to marked improvements in the recall and precision of directed edges in the final network identified, and we illustrate the approach through application to data from a recent study investigating the causal relationship between methylation and gene expression in early inflammatory arthritis patients. We also describe a second improvement in the form of a pseudo-Bayesian approach for upweighting certain network edges, which can be useful when there is prior evidence concerning their directions. Public Library of Science 2021-09-29 /pmc/articles/PMC8504979/ /pubmed/34587167 http://dx.doi.org/10.1371/journal.pgen.1009811 Text en © 2021 Howey et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Howey, Richard
Clark, Alexander D.
Naamane, Najib
Reynard, Louise N.
Pratt, Arthur G.
Cordell, Heather J.
A Bayesian network approach incorporating imputation of missing data enables exploratory analysis of complex causal biological relationships
title A Bayesian network approach incorporating imputation of missing data enables exploratory analysis of complex causal biological relationships
title_full A Bayesian network approach incorporating imputation of missing data enables exploratory analysis of complex causal biological relationships
title_fullStr A Bayesian network approach incorporating imputation of missing data enables exploratory analysis of complex causal biological relationships
title_full_unstemmed A Bayesian network approach incorporating imputation of missing data enables exploratory analysis of complex causal biological relationships
title_short A Bayesian network approach incorporating imputation of missing data enables exploratory analysis of complex causal biological relationships
title_sort bayesian network approach incorporating imputation of missing data enables exploratory analysis of complex causal biological relationships
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8504979/
https://www.ncbi.nlm.nih.gov/pubmed/34587167
http://dx.doi.org/10.1371/journal.pgen.1009811
work_keys_str_mv AT howeyrichard abayesiannetworkapproachincorporatingimputationofmissingdataenablesexploratoryanalysisofcomplexcausalbiologicalrelationships
AT clarkalexanderd abayesiannetworkapproachincorporatingimputationofmissingdataenablesexploratoryanalysisofcomplexcausalbiologicalrelationships
AT naamanenajib abayesiannetworkapproachincorporatingimputationofmissingdataenablesexploratoryanalysisofcomplexcausalbiologicalrelationships
AT reynardlouisen abayesiannetworkapproachincorporatingimputationofmissingdataenablesexploratoryanalysisofcomplexcausalbiologicalrelationships
AT prattarthurg abayesiannetworkapproachincorporatingimputationofmissingdataenablesexploratoryanalysisofcomplexcausalbiologicalrelationships
AT cordellheatherj abayesiannetworkapproachincorporatingimputationofmissingdataenablesexploratoryanalysisofcomplexcausalbiologicalrelationships
AT howeyrichard bayesiannetworkapproachincorporatingimputationofmissingdataenablesexploratoryanalysisofcomplexcausalbiologicalrelationships
AT clarkalexanderd bayesiannetworkapproachincorporatingimputationofmissingdataenablesexploratoryanalysisofcomplexcausalbiologicalrelationships
AT naamanenajib bayesiannetworkapproachincorporatingimputationofmissingdataenablesexploratoryanalysisofcomplexcausalbiologicalrelationships
AT reynardlouisen bayesiannetworkapproachincorporatingimputationofmissingdataenablesexploratoryanalysisofcomplexcausalbiologicalrelationships
AT prattarthurg bayesiannetworkapproachincorporatingimputationofmissingdataenablesexploratoryanalysisofcomplexcausalbiologicalrelationships
AT cordellheatherj bayesiannetworkapproachincorporatingimputationofmissingdataenablesexploratoryanalysisofcomplexcausalbiologicalrelationships