You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.
Combine the best of both worlds of nowcasting_dataset and power_perceiver's data loader:
Have a separate script which writes totally prepared batches to disk.
This script would operate like power_perceiver's data loader: Load a subset of contiguous data into RAM, and sample from it.
Save batches (or examples?) to disk. As Python pickles? Or numpy files?
The data would be completely ready to be loaded into the model. The data will be normalised. (The only disadvantage of this is it means we have to use float32 for pretty much everything. But that should be OK because there won't be huge numbers of batches on disk at any one time).
Write the batches into a directory like train/2022-06-24T12:45/. Always have two "done" folders, and be working on a third folder. The model code signals that it's completed one epoch by deleting the older of the two "done" folders. (Although will need a different system if multiple models are training from the same data).
Maybe have two repos: helios_model, and helios_data?
Advantages:
Much less waiting!
Don't have to wait about half an hour for the model to start training
Don't have to wait between epochs
Can manually examine the files on disk
Modularises the code
Compared to nowcasting_dataset, this new approach would be much faster to create a new set of batches from scratch. e.g. if we change the structure of the data, then we should only have to wait about an hour for a whole new epoch to be prepared.
The text was updated successfully, but these errors were encountered:
Combine the best of both worlds of
nowcasting_dataset
andpower_perceiver
's data loader:power_perceiver
's data loader: Load a subset of contiguous data into RAM, and sample from it.float32
for pretty much everything. But that should be OK because there won't be huge numbers of batches on disk at any one time).train/2022-06-24T12:45/
. Always have two "done" folders, and be working on a third folder. The model code signals that it's completed one epoch by deleting the older of the two "done" folders. (Although will need a different system if multiple models are training from the same data).helios_model
, andhelios_data
?Advantages:
nowcasting_dataset
, this new approach would be much faster to create a new set of batches from scratch. e.g. if we change the structure of the data, then we should only have to wait about an hour for a whole new epoch to be prepared.The text was updated successfully, but these errors were encountered: