Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

Write prepared batches to disk as a separate script #167

Open
Tracked by #148
JackKelly opened this issue Jun 24, 2022 · 0 comments
Open
Tracked by #148

Write prepared batches to disk as a separate script #167

JackKelly opened this issue Jun 24, 2022 · 0 comments
Labels
enhancement New feature or request

Comments

@JackKelly
Copy link
Member

JackKelly commented Jun 24, 2022

Combine the best of both worlds of nowcasting_dataset and power_perceiver's data loader:

  • Have a separate script which writes totally prepared batches to disk.
  • This script would operate like power_perceiver's data loader: Load a subset of contiguous data into RAM, and sample from it.
  • Save batches (or examples?) to disk. As Python pickles? Or numpy files?
  • The data would be completely ready to be loaded into the model. The data will be normalised. (The only disadvantage of this is it means we have to use float32 for pretty much everything. But that should be OK because there won't be huge numbers of batches on disk at any one time).
  • Write the batches into a directory like train/2022-06-24T12:45/. Always have two "done" folders, and be working on a third folder. The model code signals that it's completed one epoch by deleting the older of the two "done" folders. (Although will need a different system if multiple models are training from the same data).
  • Maybe have two repos: helios_model, and helios_data?

Advantages:

  • Much less waiting!
    • Don't have to wait about half an hour for the model to start training
    • Don't have to wait between epochs
  • Can manually examine the files on disk
  • Modularises the code
  • Compared to nowcasting_dataset, this new approach would be much faster to create a new set of batches from scratch. e.g. if we change the structure of the data, then we should only have to wait about an hour for a whole new epoch to be prepared.
@JackKelly JackKelly added the enhancement New feature or request label Jun 24, 2022
@JackKelly JackKelly moved this to Todo in Nowcasting Jun 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
No open projects
Status: Todo
Development

No branches or pull requests

1 participant