Cargando…
Time series causal relationships discovery through feature importance and ensemble models
Inferring causal relationships from observational data is a key challenge in understanding the interpretability of Machine Learning models. Given the ever-increasing amount of observational data available in many areas, Machine Learning algorithms used for forecasting have become more complex, leadi...
Autores principales: | , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10349147/ https://www.ncbi.nlm.nih.gov/pubmed/37452079 http://dx.doi.org/10.1038/s41598-023-37929-w |
_version_ | 1785073837686128640 |
---|---|
author | Castro, Manuel Mendes Júnior, Pedro Ribeiro Soriano-Vargas, Aurea de Oliveira Werneck, Rafael Moreira Gonçalves, Maiara Lusquino Filho, Leopoldo Moura, Renato Zampieri, Marcelo Linares, Oscar Ferreira, Vitor Ferreira, Alexandre Davólio, Alessandra Schiozer, Denis Rocha, Anderson |
author_facet | Castro, Manuel Mendes Júnior, Pedro Ribeiro Soriano-Vargas, Aurea de Oliveira Werneck, Rafael Moreira Gonçalves, Maiara Lusquino Filho, Leopoldo Moura, Renato Zampieri, Marcelo Linares, Oscar Ferreira, Vitor Ferreira, Alexandre Davólio, Alessandra Schiozer, Denis Rocha, Anderson |
author_sort | Castro, Manuel |
collection | PubMed |
description | Inferring causal relationships from observational data is a key challenge in understanding the interpretability of Machine Learning models. Given the ever-increasing amount of observational data available in many areas, Machine Learning algorithms used for forecasting have become more complex, leading to a less understandable path of how a decision is made by the model. To address this issue, we propose leveraging ensemble models, e.g., Random Forest, to assess which input features the trained model prioritizes when making a forecast and, in this way, establish causal relationships between the variables. The advantage of these algorithms lies in their ability to provide feature importance, which allows us to build the causal network. We present our methodology to estimate causality in time series from oil field production. As it is difficult to extract causal relations from a real field, we also included a synthetic oil production dataset and a weather dataset, which is also synthetic, to provide the ground truth. We aim to perform causal discovery, i.e., establish the existing connections between the variables in each dataset. Through an iterative process of improving the forecasting of a target’s value, we evaluate whether the forecasting improves by adding information from a new potential driver; if so, we state that the driver causally affects the target. On the oil field-related datasets, our causal analysis results agree with the interwell connections already confirmed by tracer information; whenever the tracer data are available, we used it as our ground truth. This consistency between both estimated and confirmed connections provides us the confidence about the effectiveness of our proposed methodology. To our knowledge, this is the first time causal analysis using solely production data is employed to discover interwell connections in an oil field dataset. |
format | Online Article Text |
id | pubmed-10349147 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-103491472023-07-16 Time series causal relationships discovery through feature importance and ensemble models Castro, Manuel Mendes Júnior, Pedro Ribeiro Soriano-Vargas, Aurea de Oliveira Werneck, Rafael Moreira Gonçalves, Maiara Lusquino Filho, Leopoldo Moura, Renato Zampieri, Marcelo Linares, Oscar Ferreira, Vitor Ferreira, Alexandre Davólio, Alessandra Schiozer, Denis Rocha, Anderson Sci Rep Article Inferring causal relationships from observational data is a key challenge in understanding the interpretability of Machine Learning models. Given the ever-increasing amount of observational data available in many areas, Machine Learning algorithms used for forecasting have become more complex, leading to a less understandable path of how a decision is made by the model. To address this issue, we propose leveraging ensemble models, e.g., Random Forest, to assess which input features the trained model prioritizes when making a forecast and, in this way, establish causal relationships between the variables. The advantage of these algorithms lies in their ability to provide feature importance, which allows us to build the causal network. We present our methodology to estimate causality in time series from oil field production. As it is difficult to extract causal relations from a real field, we also included a synthetic oil production dataset and a weather dataset, which is also synthetic, to provide the ground truth. We aim to perform causal discovery, i.e., establish the existing connections between the variables in each dataset. Through an iterative process of improving the forecasting of a target’s value, we evaluate whether the forecasting improves by adding information from a new potential driver; if so, we state that the driver causally affects the target. On the oil field-related datasets, our causal analysis results agree with the interwell connections already confirmed by tracer information; whenever the tracer data are available, we used it as our ground truth. This consistency between both estimated and confirmed connections provides us the confidence about the effectiveness of our proposed methodology. To our knowledge, this is the first time causal analysis using solely production data is employed to discover interwell connections in an oil field dataset. Nature Publishing Group UK 2023-07-14 /pmc/articles/PMC10349147/ /pubmed/37452079 http://dx.doi.org/10.1038/s41598-023-37929-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Castro, Manuel Mendes Júnior, Pedro Ribeiro Soriano-Vargas, Aurea de Oliveira Werneck, Rafael Moreira Gonçalves, Maiara Lusquino Filho, Leopoldo Moura, Renato Zampieri, Marcelo Linares, Oscar Ferreira, Vitor Ferreira, Alexandre Davólio, Alessandra Schiozer, Denis Rocha, Anderson Time series causal relationships discovery through feature importance and ensemble models |
title | Time series causal relationships discovery through feature importance and ensemble models |
title_full | Time series causal relationships discovery through feature importance and ensemble models |
title_fullStr | Time series causal relationships discovery through feature importance and ensemble models |
title_full_unstemmed | Time series causal relationships discovery through feature importance and ensemble models |
title_short | Time series causal relationships discovery through feature importance and ensemble models |
title_sort | time series causal relationships discovery through feature importance and ensemble models |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10349147/ https://www.ncbi.nlm.nih.gov/pubmed/37452079 http://dx.doi.org/10.1038/s41598-023-37929-w |
work_keys_str_mv | AT castromanuel timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels AT mendesjuniorpedroribeiro timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels AT sorianovargasaurea timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels AT deoliveirawerneckrafael timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels AT moreiragoncalvesmaiara timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels AT lusquinofilholeopoldo timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels AT mourarenato timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels AT zampierimarcelo timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels AT linaresoscar timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels AT ferreiravitor timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels AT ferreiraalexandre timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels AT davolioalessandra timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels AT schiozerdenis timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels AT rochaanderson timeseriescausalrelationshipsdiscoverythroughfeatureimportanceandensemblemodels |