You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the case where you want to reuse the trained dataset for multiple adjustments the docs already mention that you can trigger the training by calling .load on the .ds dataset object of the class. This loads the trained model into the memory of the main thread. For bigger, datasets it might be required to leave the data on the worker but still only compute it once.
This can already be done by doing something like
qdm.set_dataset(qdm.ds.persist())
I could imagine that this is a common enough case to add an argument to the train method that does exactly that.
Potential Solution
Extend the train method by adding persist=False optional argument. That if true updates the trained dataset to be a persisted dask array.
Additional context
No response
Contribution
I would be willing/able to open a Pull Request to contribute this feature.
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
I haven't tried that yet, maybe I should! In our latest large scale workflows, we wrote the training the dataset to disk as it was larger than memory anyway.
I find the qdm.set_dataset(qdm.ds.persist()) line to be simple enough, I'm not totally convinced this warrants an implementation in xclim ? However, that implementation would also be very simple, I suggest adding a persist method to ParametrizableWithDatasethere. Would that solve the issue for you?
Ah yes that would be another way to do it. I agree maybe it doesn't warrant an extra step especially if the more common use case is to persist on disk! Feel free to close
Addressing a Problem?
In the case where you want to reuse the trained dataset for multiple adjustments the docs already mention that you can trigger the training by calling
.load
on the.ds
dataset object of the class. This loads the trained model into the memory of the main thread. For bigger, datasets it might be required to leave the data on the worker but still only compute it once.This can already be done by doing something like
I could imagine that this is a common enough case to add an argument to the train method that does exactly that.
Potential Solution
Extend the train method by adding
persist=False
optional argument. That if true updates the trained dataset to be a persisted dask array.Additional context
No response
Contribution
Code of Conduct
The text was updated successfully, but these errors were encountered: