This is a web scraper that collects data from the E-REDES website and can upload it to a database. Since there is no exposed programatic interface to the data, this web scraper was developed as approach to collect it. A high-level of the process is:
- The scraper collects the data from the E-REDES website.
- A file with the energy consumption readings is downloaded.
- [ Optional ] The file is parsed and the data is uploaded to the selected database.
- [ Optional ] A feature supporting only the insertion of "deltas" is available.
This package supports E-REDES website available at time of writing 14/06/2023. The entrypoint for the scraper is the page https://balcaodigital.e-redes.pt/consumptions/history.
The package can be installed using pip:
pip install eredesscraper
Usage is based on a YAML configuration file.
config.yml
holds the credentials for the E-REDES website and
the database connection. Currently, only InfluxDB is supported as a database sink.
eredes:
# eredes credentials
nif: <my-eredes-nif>
pwd: <my-eredes-password>
# CPE to monitor. e.g. PT00############04TW (where # is a digit). CPE can be found in your bill details
cpe: PT00############04TW
influxdb:
# url to InfluxDB. e.g. http://localhost or https://influxdb.my-domain.com
host: http://localhost
# default port is 8086
port: 8086
bucket: <my-influx-bucket>
org: <my-influx-org>
# access token with write access
token: <token>
ers config load "/path/to/config.yml"
# get current month readings
ers run -d influxdb
# get only deltas from last month readings
ers run -w previous -d influxdb --delta
# get readings from May 2023
ers run -w select -d influxdb -m 5 -y 2023
# start an API server
ers server -H "localhost" -p 8778 --reload -S <path/to/database>
For more details refer to the OpenAPI documentation or the UI endpoints available at http://<host>:<port>/docs
and http://<host>:<port>/redoc
# main methods:
# load an ers configuration
# different options to load available:
# - directly in the request body,
# - download remote file,
# - upload local file
curl -X 'POST' \
'http://localhost:8778/config/upload' \
-H 'Content-Type: multipart/form-data' \
-F '[email protected]'
# run sync workflow
curl -X 'POST' \
'http://localhost:8778/run' \
-H 'Content-Type: application/json' \
-d '{
"workflow": "current"
}'
# run async workflow
curl -X 'POST' \
'http://localhost:8778/run_async' \
-H 'Content-Type: application/json' \
-d '{
"workflow": "select",
"db": [
"influxdb"
],
"month": 5,
"year": 2023,
"delta": true,
"download": true
}'
# get task status (`task_id` returned in /run_async response body)
curl -X 'GET' \
'http://localhost:8778/status/<task_id>'
# download the file retrieved by the workflow
curl -X 'GET' \
'http://localhost:8778/download/<task_id>'
from eredesscraper.workflows import switchboard
from pathlib import Path
# get deltas from current month readings
switchboard(config_path=Path("./config.yml"),
name="current",
db=list("influxdb"),
delta=True,
keep=True)
# get readings from May 2023
switchboard(config_path=Path("./config.yml"),
name="select",
db=list("influxdb"),
month=5,
year=2023)
current
: Collects the current month consumption.previous
: Collects the previous month consumption data.select
: Collects the consumption data from an arbitrary month parsed by the user.
influxdb
: Loads the data in an InfluxDB database. (https://docs.influxdata.com/influxdb/v2/get-started/)
-
Add workflow for retrieving previous month data. -
Add workflow for retrieving data form an arbitrary month. -
Build CLI. -
Build API -
Containerize app. - Documentation.
-
Add CI/CD. - Add logging.
-
Add tests(limited coverage). - Add runtime support for multiple CPEs.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
See LICENSE file.