E-REDES Scraper

Description

This is a web scraper that collects data from the E-REDES website and can upload it to a database. Since there is no exposed programatic interface to the data, this web scraper was developed as approach to collect it. A high-level of the process is:

The scraper collects the data from the E-REDES website.
A file with the energy consumption readings is downloaded.
[ Optional ] The file is parsed and the data is uploaded to the selected database.
[ Optional ] A feature supporting only the insertion of "deltas" is available.

This package supports E-REDES website available at time of writing 14/06/2023. The entrypoint for the scraper is the page https://balcaodigital.e-redes.pt/consumptions/history.

Installation

The package can be installed using pip:

pip install eredesscraper

Configuration

Usage is based on a YAML configuration file.
config.yml holds the credentials for the E-REDES website and the database connection. Currently, only InfluxDB is supported as a database sink.

Template `config.yml`:

eredes:
  # eredes credentials
  nif: <my-eredes-nif>
  pwd: <my-eredes-password>
  # CPE to monitor. e.g. PT00############04TW (where # is a digit). CPE can be found in your bill details
  cpe: PT00############04TW


influxdb:
  # url to InfluxDB.  e.g. http://localhost or https://influxdb.my-domain.com
  host: http://localhost
  # default port is 8086
  port: 8086
  bucket: <my-influx-bucket>
  org: <my-influx-org>
  # access token with write access
  token: <token>

Usage

CLI:

ers config load "/path/to/config.yml"

# get current month readings
ers run -d influxdb

# get only deltas from last month readings 
ers run -w previous -d influxdb --delta

# get readings from May 2023
ers run -w select -d influxdb -m 5 -y 2023

# start an API server
ers server -H "localhost" -p 8778 --reload -S <path/to/database>

API:

For more details refer to the OpenAPI documentation or the UI endpoints available at http://<host>:<port>/docs and http://<host>:<port>/redoc

# main methods:

# load an ers configuration 
# different options to load available:
# - directly in the request body,
# - download remote file,
# - upload local file
curl -X 'POST' \
  'http://localhost:8778/config/upload' \
  -H 'Content-Type: multipart/form-data' \
  -F '[email protected]'


# run sync workflow
curl -X 'POST' \
  'http://localhost:8778/run' \
  -H 'Content-Type: application/json' \
  -d '{
  "workflow": "current"
}'

# run async workflow
curl -X 'POST' \
  'http://localhost:8778/run_async' \
  -H 'Content-Type: application/json' \
  -d '{
  "workflow": "select",
  "db": [
    "influxdb"
  ],
  "month": 5,
  "year": 2023,
  "delta": true,
  "download": true
}'

# get task status (`task_id` returned in /run_async response body)
curl -X 'GET' \
  'http://localhost:8778/status/<task_id>'

# download the file retrieved by the workflow
curl -X 'GET' \
  'http://localhost:8778/download/<task_id>'

Python:

from eredesscraper.workflows import switchboard
from pathlib import Path

# get deltas from current month readings
switchboard(config_path=Path("./config.yml"),
            name="current",
            db=list("influxdb"),
            delta=True,
            keep=True)

# get readings from May 2023
switchboard(config_path=Path("./config.yml"),
            name="select",
            db=list("influxdb"),
            month=5,
            year=2023)

Features

Available workflows:

current: Collects the current month consumption.
previous: Collects the previous month consumption data.
select: Collects the consumption data from an arbitrary month parsed by the user.

Available databases:

influxdb: Loads the data in an InfluxDB database. (https://docs.influxdata.com/influxdb/v2/get-started/)

Roadmap

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

See LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
.github/workflows		.github/workflows
eredesscraper		eredesscraper
static		static
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
config.yml		config.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

E-REDES Scraper

Description

Installation

Configuration

Template `config.yml`:

Usage

CLI:

API:

Python:

Features

Available workflows:

Available databases:

Roadmap

Contributing

License

About

Releases 6

Packages

Languages

License

rf-santos/eredes-scraper

Folders and files

Latest commit

History

Repository files navigation

E-REDES Scraper

Description

Installation

Configuration

Template config.yml:

Usage

CLI:

API:

Python:

Features

Available workflows:

Available databases:

Roadmap

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Languages

Template `config.yml`:

Packages