Cargando…

IDSEM, an invoices database of the Spanish electricity market

This article describes a new database of electricity bills related to energy consumption in Spanish households. The dataset includes individual invoices containing information about the consumption and billing of each supply point. These documents include additional data about the customer, the cont...

Descripción completa

Detalles Bibliográficos
Autores principales: Sánchez, Javier, Salgado, Agustín, García, Alejandro, Monzón, Nelson
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9809319/
https://www.ncbi.nlm.nih.gov/pubmed/36572678
http://dx.doi.org/10.1038/s41597-022-01885-3
_version_ 1784863095921836032
author Sánchez, Javier
Salgado, Agustín
García, Alejandro
Monzón, Nelson
author_facet Sánchez, Javier
Salgado, Agustín
García, Alejandro
Monzón, Nelson
author_sort Sánchez, Javier
collection PubMed
description This article describes a new database of electricity bills related to energy consumption in Spanish households. The dataset includes individual invoices containing information about the consumption and billing of each supply point. These documents include additional data about the customer, the contract, and the electricity company. We propose a pipeline for the creation of bill contents through a simulation process based on regulations and statistics from official bodies and electricity companies. This makes it possible to generate many documents with synthetic data. The simulation is based on 86 different labels, which are necessary to create realistic invoices. The dataset has 75 000 documents in PDF format with their corresponding labels in JSON files. It is useful for training machine learning algorithms and, in particular, for developing methods to automatically extract information from the bills. It is also interesting to design new algorithms for analyzing the behavior of electricity markets from different perspectives.
format Online
Article
Text
id pubmed-9809319
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-98093192023-01-04 IDSEM, an invoices database of the Spanish electricity market Sánchez, Javier Salgado, Agustín García, Alejandro Monzón, Nelson Sci Data Data Descriptor This article describes a new database of electricity bills related to energy consumption in Spanish households. The dataset includes individual invoices containing information about the consumption and billing of each supply point. These documents include additional data about the customer, the contract, and the electricity company. We propose a pipeline for the creation of bill contents through a simulation process based on regulations and statistics from official bodies and electricity companies. This makes it possible to generate many documents with synthetic data. The simulation is based on 86 different labels, which are necessary to create realistic invoices. The dataset has 75 000 documents in PDF format with their corresponding labels in JSON files. It is useful for training machine learning algorithms and, in particular, for developing methods to automatically extract information from the bills. It is also interesting to design new algorithms for analyzing the behavior of electricity markets from different perspectives. Nature Publishing Group UK 2022-12-26 /pmc/articles/PMC9809319/ /pubmed/36572678 http://dx.doi.org/10.1038/s41597-022-01885-3 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Data Descriptor
Sánchez, Javier
Salgado, Agustín
García, Alejandro
Monzón, Nelson
IDSEM, an invoices database of the Spanish electricity market
title IDSEM, an invoices database of the Spanish electricity market
title_full IDSEM, an invoices database of the Spanish electricity market
title_fullStr IDSEM, an invoices database of the Spanish electricity market
title_full_unstemmed IDSEM, an invoices database of the Spanish electricity market
title_short IDSEM, an invoices database of the Spanish electricity market
title_sort idsem, an invoices database of the spanish electricity market
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9809319/
https://www.ncbi.nlm.nih.gov/pubmed/36572678
http://dx.doi.org/10.1038/s41597-022-01885-3
work_keys_str_mv AT sanchezjavier idsemaninvoicesdatabaseofthespanishelectricitymarket
AT salgadoagustin idsemaninvoicesdatabaseofthespanishelectricitymarket
AT garciaalejandro idsemaninvoicesdatabaseofthespanishelectricitymarket
AT monzonnelson idsemaninvoicesdatabaseofthespanishelectricitymarket