Cargando…
Developing reliable hourly electricity demand data through screening and imputation
Electricity usage (demand) data are used by utilities, governments, and academics to model electric grids for a variety of planning (e.g., capacity expansion and system operation) purposes. The U.S. Energy Information Administration collects hourly demand data from all balancing authorities (BAs) in...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7250876/ https://www.ncbi.nlm.nih.gov/pubmed/32457368 http://dx.doi.org/10.1038/s41597-020-0483-x |
_version_ | 1783538844185919488 |
---|---|
author | Ruggles, Tyler H. Farnham, David J. Tong, Dan Caldeira, Ken |
author_facet | Ruggles, Tyler H. Farnham, David J. Tong, Dan Caldeira, Ken |
author_sort | Ruggles, Tyler H. |
collection | PubMed |
description | Electricity usage (demand) data are used by utilities, governments, and academics to model electric grids for a variety of planning (e.g., capacity expansion and system operation) purposes. The U.S. Energy Information Administration collects hourly demand data from all balancing authorities (BAs) in the contiguous United States. As of September 2019, we find 2.2% of the demand data in their database are missing. Additionally, 0.5% of reported quantities are either negative values or are otherwise identified as outliers. With the goal of attaining non-missing, continuous, and physically plausible demand data to facilitate analysis, we developed a screening process to identify anomalous values. We then applied a Multiple Imputation by Chained Equations (MICE) technique to impute replacements for missing and anomalous values. We conduct cross-validation on the MICE technique by marking subsets of plausible data as missing, and using the remaining data to predict this “missing” data. The mean absolute percentage error of imputed values is 3.5% across all BAs. The cleaned data are published and available open access: 10.5281/zenodo.3690240. |
format | Online Article Text |
id | pubmed-7250876 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-72508762020-06-04 Developing reliable hourly electricity demand data through screening and imputation Ruggles, Tyler H. Farnham, David J. Tong, Dan Caldeira, Ken Sci Data Data Descriptor Electricity usage (demand) data are used by utilities, governments, and academics to model electric grids for a variety of planning (e.g., capacity expansion and system operation) purposes. The U.S. Energy Information Administration collects hourly demand data from all balancing authorities (BAs) in the contiguous United States. As of September 2019, we find 2.2% of the demand data in their database are missing. Additionally, 0.5% of reported quantities are either negative values or are otherwise identified as outliers. With the goal of attaining non-missing, continuous, and physically plausible demand data to facilitate analysis, we developed a screening process to identify anomalous values. We then applied a Multiple Imputation by Chained Equations (MICE) technique to impute replacements for missing and anomalous values. We conduct cross-validation on the MICE technique by marking subsets of plausible data as missing, and using the remaining data to predict this “missing” data. The mean absolute percentage error of imputed values is 3.5% across all BAs. The cleaned data are published and available open access: 10.5281/zenodo.3690240. Nature Publishing Group UK 2020-05-26 /pmc/articles/PMC7250876/ /pubmed/32457368 http://dx.doi.org/10.1038/s41597-020-0483-x Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article. |
spellingShingle | Data Descriptor Ruggles, Tyler H. Farnham, David J. Tong, Dan Caldeira, Ken Developing reliable hourly electricity demand data through screening and imputation |
title | Developing reliable hourly electricity demand data through screening and imputation |
title_full | Developing reliable hourly electricity demand data through screening and imputation |
title_fullStr | Developing reliable hourly electricity demand data through screening and imputation |
title_full_unstemmed | Developing reliable hourly electricity demand data through screening and imputation |
title_short | Developing reliable hourly electricity demand data through screening and imputation |
title_sort | developing reliable hourly electricity demand data through screening and imputation |
topic | Data Descriptor |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7250876/ https://www.ncbi.nlm.nih.gov/pubmed/32457368 http://dx.doi.org/10.1038/s41597-020-0483-x |
work_keys_str_mv | AT rugglestylerh developingreliablehourlyelectricitydemanddatathroughscreeningandimputation AT farnhamdavidj developingreliablehourlyelectricitydemanddatathroughscreeningandimputation AT tongdan developingreliablehourlyelectricitydemanddatathroughscreeningandimputation AT caldeiraken developingreliablehourlyelectricitydemanddatathroughscreeningandimputation |