Cargando…

Developing reliable hourly electricity demand data through screening and imputation

Electricity usage (demand) data are used by utilities, governments, and academics to model electric grids for a variety of planning (e.g., capacity expansion and system operation) purposes. The U.S. Energy Information Administration collects hourly demand data from all balancing authorities (BAs) in...

Descripción completa

Detalles Bibliográficos
Autores principales: Ruggles, Tyler H., Farnham, David J., Tong, Dan, Caldeira, Ken
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7250876/
https://www.ncbi.nlm.nih.gov/pubmed/32457368
http://dx.doi.org/10.1038/s41597-020-0483-x
_version_ 1783538844185919488
author Ruggles, Tyler H.
Farnham, David J.
Tong, Dan
Caldeira, Ken
author_facet Ruggles, Tyler H.
Farnham, David J.
Tong, Dan
Caldeira, Ken
author_sort Ruggles, Tyler H.
collection PubMed
description Electricity usage (demand) data are used by utilities, governments, and academics to model electric grids for a variety of planning (e.g., capacity expansion and system operation) purposes. The U.S. Energy Information Administration collects hourly demand data from all balancing authorities (BAs) in the contiguous United States. As of September 2019, we find 2.2% of the demand data in their database are missing. Additionally, 0.5% of reported quantities are either negative values or are otherwise identified as outliers. With the goal of attaining non-missing, continuous, and physically plausible demand data to facilitate analysis, we developed a screening process to identify anomalous values. We then applied a Multiple Imputation by Chained Equations (MICE) technique to impute replacements for missing and anomalous values. We conduct cross-validation on the MICE technique by marking subsets of plausible data as missing, and using the remaining data to predict this “missing” data. The mean absolute percentage error of imputed values is 3.5% across all BAs. The cleaned data are published and available open access: 10.5281/zenodo.3690240.
format Online
Article
Text
id pubmed-7250876
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-72508762020-06-04 Developing reliable hourly electricity demand data through screening and imputation Ruggles, Tyler H. Farnham, David J. Tong, Dan Caldeira, Ken Sci Data Data Descriptor Electricity usage (demand) data are used by utilities, governments, and academics to model electric grids for a variety of planning (e.g., capacity expansion and system operation) purposes. The U.S. Energy Information Administration collects hourly demand data from all balancing authorities (BAs) in the contiguous United States. As of September 2019, we find 2.2% of the demand data in their database are missing. Additionally, 0.5% of reported quantities are either negative values or are otherwise identified as outliers. With the goal of attaining non-missing, continuous, and physically plausible demand data to facilitate analysis, we developed a screening process to identify anomalous values. We then applied a Multiple Imputation by Chained Equations (MICE) technique to impute replacements for missing and anomalous values. We conduct cross-validation on the MICE technique by marking subsets of plausible data as missing, and using the remaining data to predict this “missing” data. The mean absolute percentage error of imputed values is 3.5% across all BAs. The cleaned data are published and available open access: 10.5281/zenodo.3690240. Nature Publishing Group UK 2020-05-26 /pmc/articles/PMC7250876/ /pubmed/32457368 http://dx.doi.org/10.1038/s41597-020-0483-x Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.
spellingShingle Data Descriptor
Ruggles, Tyler H.
Farnham, David J.
Tong, Dan
Caldeira, Ken
Developing reliable hourly electricity demand data through screening and imputation
title Developing reliable hourly electricity demand data through screening and imputation
title_full Developing reliable hourly electricity demand data through screening and imputation
title_fullStr Developing reliable hourly electricity demand data through screening and imputation
title_full_unstemmed Developing reliable hourly electricity demand data through screening and imputation
title_short Developing reliable hourly electricity demand data through screening and imputation
title_sort developing reliable hourly electricity demand data through screening and imputation
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7250876/
https://www.ncbi.nlm.nih.gov/pubmed/32457368
http://dx.doi.org/10.1038/s41597-020-0483-x
work_keys_str_mv AT rugglestylerh developingreliablehourlyelectricitydemanddatathroughscreeningandimputation
AT farnhamdavidj developingreliablehourlyelectricitydemanddatathroughscreeningandimputation
AT tongdan developingreliablehourlyelectricitydemanddatathroughscreeningandimputation
AT caldeiraken developingreliablehourlyelectricitydemanddatathroughscreeningandimputation