You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.
A large part of my hope for the ML research we're doing in 2022 is to train across multiple "types" of prepared dataset. For example:
Similar to what we used in December: satellite + PV + GSP + NWP (e.g. over Britain) (dataset v17)
just satellite (e.g. over the ocean); (dataset v18)
satellite + PV + global NWP over UK, Italy and Malta; (dataset v19)
(And, if the model overfits, then maybe try training on a few other video prediction datasets like "moving MNIST" or synthetic image sequences of clouds moving)
Context
To train our models to predict future satellite imagery, we probably want to use the entire geographical extent of the satellite imagery.
But we also want to predict PV in the UK, Italy and Malta.
so we might want each batch to contain a mix of examples: some examples will be from the UK (as is the case now), and some examples from anywhere in the geo extent of the satellite imagery (including over oceans) without any PV.
at the moment, nowcasting_dataset can't do this "mixture".
The simplest way to do this might actually be to leave nowcasting_dataset mostly alone, and produce multiple different sets of batches (one set over the UK; the other set without PV data, and from the entire geo extent of the imagery). Then power_perceiver will load multiple batches at once. This has the advantage that we can quickly experiment with dynamically changing the ratio of "UK" to "non-UK" imagery as training progresses.
But this simpler approach still requires that we update nowcasting_dataset a bit (e.g. to randomly sample locations from the entire geo extent of the satellite imagery.)
Possible Implementation
Maybe implement a thin adaptor which holds multiple power_perceiver.NowcastingDataset instances, and itself inherits from torch.utils.data.Dataset. This thin adaptor would sample randomly sample from the upstream power_perceiver.NowcastingDataset instances and stack the Tensors. So for example, if we're combining "just satellite" data and "satellite + PV + GSP + NWP" then, say, the first 16 examples in each batch would be "just satellite", and the first 16 examples for PV, GSP, and NWP would be zeros (and would be masked out before it goes into the Perceiver).
The text was updated successfully, but these errors were encountered:
Detailed Description
A large part of my hope for the ML research we're doing in 2022 is to train across multiple "types" of prepared dataset. For example:
Context
To train our models to predict future satellite imagery, we probably want to use the entire geographical extent of the satellite imagery.
But we also want to predict PV in the UK, Italy and Malta.
so we might want each batch to contain a mix of examples: some examples will be from the UK (as is the case now), and some examples from anywhere in the geo extent of the satellite imagery (including over oceans) without any PV.
at the moment,
nowcasting_dataset
can't do this "mixture".The simplest way to do this might actually be to leave
nowcasting_dataset
mostly alone, and produce multiple different sets of batches (one set over the UK; the other set without PV data, and from the entire geo extent of the imagery). Thenpower_perceiver
will load multiple batches at once. This has the advantage that we can quickly experiment with dynamically changing the ratio of "UK" to "non-UK" imagery as training progresses.But this simpler approach still requires that we update nowcasting_dataset a bit (e.g. to randomly sample locations from the entire geo extent of the satellite imagery.)
Possible Implementation
Maybe implement a thin adaptor which holds multiple
power_perceiver.NowcastingDataset
instances, and itself inherits fromtorch.utils.data.Dataset
. This thin adaptor would sample randomly sample from the upstreampower_perceiver.NowcastingDataset
instances and stack theTensors
. So for example, if we're combining "just satellite" data and "satellite + PV + GSP + NWP" then, say, the first 16 examples in each batch would be "just satellite", and the first 16 examples for PV, GSP, and NWP would be zeros (and would be masked out before it goes into the Perceiver).The text was updated successfully, but these errors were encountered: