Machine learning experiments for forecasting the electricity system (starting with solar)
We recommend installing mamba and using mamba env create -f base_environment.yml
instead of conda env create -f base_environment.yml
.
If installing on a platform without a GPU, then uncomment - cpuonly
in base_environment.yml
.
conda env create -f base_environment.yml
conda activate power_perceiver
# If training, then also install the dependencies listed in train_environment.yml:
# See https://stackoverflow.com/a/43873901/732596
conda env update --file train_environment.yml --prune
pip install -e .
pre-commit install
If using Ranger21
optimizer then please install Ranger21 with my tiny little patch.
To prevent mamba update --all
from trying to replace the GPU version of PyTorch with the CPU version,
add this to ~/miniconda3/envs/power_perceiver/conda-meta/pinned
:
# Prevent mamba update --all from trying to install CPU version of torch.
# See: https://stackoverflow.com/a/70536292/732596
cudatoolkit<11.6
To install the base config, use: pip install -e .
To install the code necessary to train, use: pip install -e .[develop,train]
There are two different data pipelines:
power_perceiver.load_prepared_batches
: Loads batches pre-prepared bynowcasting_dataset
power_perceiver.load_raw
: Loads raw (well, intermediate) data
The data flows through several steps, in order:
- Every
PreparedDataSource
subclass loads a batch off disk and processes thexr.Dataset
using the sequence oftransforms
passed into thePreparedDataSource
's constructor. The processed data for everyPreparedDataSource
goes into anXarrayBatch
. The transforms live inpower_perceiver.transforms.<data source name>.py
PreparedDataset
then processes thisXarrayBatch
with its list ofxr_batch_processors
. Thexr_batch_processors
are processors which need to see across or touch multiple modalities at once while the data is still in an xarray Dataset.- Each
XarrayBatch
is then converted to aNumpyBatch
by thatPreparedDataSource
'sto_numpy
method. Theto_numpy
method also normalises, converts units, etc. - Finally,
PreparedDataset
passes the entireNumpyBatch
through the sequence ofnp_batch_processors
.
Originally, when I started work on "Power Perceiver" 5 months ago, my intention was to use DeepMind's Perceiver IO at the core of the model. Right now, the model actually just uses a standard transformer encoder, not a Perceiver. But I plan to start using a Perceiver IO again within a month or two, when we start using more input elements than a standard transformer encoder can cope with!
Thank you to nvidia for their very generous support: nvidia gave us four RTX A6000 GPUs via the nvidia foundation, and a further two RTX A6000 GPUs via the nvidia hardware grant. Thank you nvidia!