Cargando…

Time series causal relationships discovery through feature importance and ensemble models

Inferring causal relationships from observational data is a key challenge in understanding the interpretability of Machine Learning models. Given the ever-increasing amount of observational data available in many areas, Machine Learning algorithms used for forecasting have become more complex, leadi...

Descripción completa

Detalles Bibliográficos
Autores principales: Castro, Manuel, Mendes Júnior, Pedro Ribeiro, Soriano-Vargas, Aurea, de Oliveira Werneck, Rafael, Moreira Gonçalves, Maiara, Lusquino Filho, Leopoldo, Moura, Renato, Zampieri, Marcelo, Linares, Oscar, Ferreira, Vitor, Ferreira, Alexandre, Davólio, Alessandra, Schiozer, Denis, Rocha, Anderson
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10349147/
https://www.ncbi.nlm.nih.gov/pubmed/37452079
http://dx.doi.org/10.1038/s41598-023-37929-w
_version_ 1785073837686128640
author Castro, Manuel
Mendes Júnior, Pedro Ribeiro
Soriano-Vargas, Aurea
de Oliveira Werneck, Rafael
Moreira Gonçalves, Maiara
Lusquino Filho, Leopoldo
Moura, Renato
Zampieri, Marcelo
Linares, Oscar
Ferreira, Vitor
Ferreira, Alexandre
Davólio, Alessandra
Schiozer, Denis
Rocha, Anderson
author_facet Castro, Manuel
Mendes Júnior, Pedro Ribeiro
Soriano-Vargas, Aurea
de Oliveira Werneck, Rafael
Moreira Gonçalves, Maiara
Lusquino Filho, Leopoldo
Moura, Renato
Zampieri, Marcelo
Linares, Oscar
Ferreira, Vitor
Ferreira, Alexandre
Davólio, Alessandra
Schiozer, Denis
Rocha, Anderson
author_sort Castro, Manuel
collection PubMed
description Inferring causal relationships from observational data is a key challenge in understanding the interpretability of Machine Learning models. Given the ever-increasing amount of observational data available in many areas, Machine Learning algorithms used for forecasting have become more complex, leading to a less understandable path of how a decision is made by the model. To address this issue, we propose leveraging ensemble models, e.g., Random Forest, to assess which input features the trained model prioritizes when making a forecast and, in this way, establish causal relationships between the variables. The advantage of these algorithms lies in their ability to provide feature importance, which allows us to build the causal network. We present our methodology to estimate causality in time series from oil field production. As it is difficult to extract causal relations from a real field, we also included a synthetic oil production dataset and a weather dataset, which is also synthetic, to provide the ground truth. We aim to perform causal discovery, i.e., establish the existing connections between the variables in each dataset. Through an iterative process of improving the forecasting of a target’s value, we evaluate whether the forecasting improves by adding information from a new potential driver; if so, we state that the driver causally affects the target. On the oil field-related datasets, our causal analysis results agree with the interwell connections already confirmed by tracer information; whenever the tracer data are available, we used it as our ground truth. This consistency between both estimated and confirmed connections provides us the confidence about the effectiveness of our proposed methodology. To our knowledge, this is the first time causal analysis using solely production data is employed to discover interwell connections in an oil field dataset.
format Online
Article
Text
id pubmed-10349147
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-103491472023-07-16 Time series causal relationships discovery through feature importance and ensemble models Castro, Manuel Mendes Júnior, Pedro Ribeiro Soriano-Vargas, Aurea de Oliveira Werneck, Rafael Moreira Gonçalves, Maiara Lusquino Filho, Leopoldo Moura, Renato Zampieri, Marcelo Linares, Oscar Ferreira, Vitor Ferreira, Alexandre Davólio, Alessandra Schiozer, Denis Rocha, Anderson Sci Rep Article Inferring causal relationships from observational data is a key challenge in understanding the interpretability of Machine Learning models. Given the ever-increasing amount of observational data available in many areas, Machine Learning algorithms used for forecasting have become more complex, leading to a less understandable path of how a decision is made by the model. To address this issue, we propose leveraging ensemble models, e.g., Random Forest, to assess which input features the trained model prioritizes when making a forecast and, in this way, establish causal relationships between the variables. The advantage of these algorithms lies in their ability to provide feature importance, which allows us to build the causal network. We present our methodology to estimate causality in time series from oil field production. As it is difficult to extract causal relations from a real field, we also included a synthetic oil production dataset and a weather dataset, which is also synthetic, to provide the ground truth. We aim to perform causal discovery, i.e., establish the existing connections between the variables in each dataset. Through an iterative process of improving the forecasting of a target’s value, we evaluate whether the forecasting improves by adding information from a new potential driver; if so, we state that the driver causally affects the target. On the oil field-related datasets, our causal analysis results agree with the interwell connections already confirmed by tracer information; whenever the tracer data are available, we used it as our ground truth. This consistency between both estimated and confirmed connections provides us the confidence about the effectiveness of our proposed methodology. To our knowledge, this is the first time causal analysis using solely production data is employed to discover interwell connections in an oil field dataset. Nature Publishing Group UK 2023-07-14 /pmc/articles/PMC10349147/ /pubmed/37452079 http://dx.doi.org/10.1038/s41598-023-37929-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Castro, Manuel
Mendes Júnior, Pedro Ribeiro
Soriano-Vargas, Aurea
de Oliveira Werneck, Rafael
Moreira Gonçalves, Maiara
Lusquino Filho, Leopoldo
Moura, Renato
Zampieri, Marcelo
Linares, Oscar
Ferreira, Vitor
Ferreira, Alexandre
Davólio, Alessandra
Schiozer, Denis
Rocha, Anderson
Time series causal relationships discovery through feature importance and ensemble models
title Time series causal relationships discovery through feature importance and ensemble models
title_full Time series causal relationships discovery through feature importance and ensemble models
title_fullStr Time series causal relationships discovery through feature importance and ensemble models
title_full_unstemmed Time series causal relationships discovery through feature importance and ensemble models
title_short Time series causal relationships discovery through feature importance and ensemble models
title_sort time series causal relationships discovery through feature importance and ensemble models
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10349147/
https://www.ncbi.nlm.nih.gov/pubmed/37452079
http://dx.doi.org/10.1038/s41598-023-37929-w
work_keys_str_mv AT castromanuel timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels
AT mendesjuniorpedroribeiro timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels
AT sorianovargasaurea timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels
AT deoliveirawerneckrafael timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels
AT moreiragoncalvesmaiara timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels
AT lusquinofilholeopoldo timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels
AT mourarenato timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels
AT zampierimarcelo timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels
AT linaresoscar timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels
AT ferreiravitor timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels
AT ferreiraalexandre timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels
AT davolioalessandra timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels
AT schiozerdenis timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels
AT rochaanderson timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels