Categorical distribution and ground truth during training #18
-
Hello, I am implementing this with radar-only data obtained from a local source here in Sweden for my master thesis, first I'd like to thank you for the implementation! It has helped me a great deal, but I have some questions about the ground truth the network is supposed to use and how the training is done. The paper speaks of a 512 categorical distribution. That is 512 bins for 0.2 mm/h increments. Does this mean that our ground truth for one training sample will be a tensor of shape (512, width, height) with respective one-hot encoding from radar data representing which bucket we are in for each pixel? This seems like a very sparse way to represent the data, when I look at my data there very few occasions when there's more than a couple of pixels representing each bin at >5mm/h. What do you think? In my case the model will have 5 minute resolution for lead times and 300 minutes into the future (60 different lead times). How do we handle training with different lead times? It seems like in your code we loop through all 60 possibilities for each batch and perform backprop in accordance. It's not clear to me if it's better to stick with one random lead time (forecast_steps=1) for each training sample in a given batch or to loop through every lead time. (I am afraid this might overfit the input nodes if we use almost same input vector for 60 samples) I have already done preprocessing steps for my data in numpy with centercrop-split and space-to-depth (8 channels), as well as 3 channels for elevation, longitude and latitude, 4 channels for "time of year" and "time of day" as periodic representations. 15 channels total in input, I have therefore commented out the line "x = self.preprocessor(x)" inside encode_timestep(). Is there any bit of caution I have to exercise here or is it OK as long as the network doesn't give any error-messages? Sorry but I don't have much experience pytorch... Thanks again /Valter |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 6 replies
-
Hello, Glad you find this useful! Just curious, is the radar data openly available? Would love to have more training datasets other than the US one in the paper. As for the training of the model, yeah, my understanding of the paper is that it is a tensor of 512, width, height as the target. It is quite a sparse representation, similar to the very sparse temporal representation used as well. One way is to subsample the data so that the rare, high-rainfall data is shown more often, another paper, the Deep Generative Model of Radar (DGMR) dataset does that and the dataset is public, I'm currently mirroring it on HuggingFace as well, and they sampled the rare events more often to make sure the model learned about high rainfall events. The categorical representation does make it easier for the model to learn probabilistic forecasts though, as it can just predict the proability for each forecast "category". The current code is somewhat predicated on training on all future lead times at the same time, but you could potentially train a model for each different lead time if you want to, for example. Or do what MetNet-2 did, where they kept a checkpoint for each of the future lead times, so could use the best model weights for a specific lead time during inference. But also, if you come up with a more efficient way of training on the different lead times, feel free to open a PR! For your preprocessing, yeah, as long as there is no error messages, it should be fine, so I wouldn't worry about it. |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot for your input and quick answer, I will check with my supervisor to see if we can publish the dataset (or part of it) but I'm not sure it will be possible. I guess the representation makes sense but I will have to investigate the average rainfall for the different training samples. The hardest part I guess will be to parallelize before running it on some GPUs. Godspeed me. |
Beta Was this translation helpful? Give feedback.
-
I emailed authors of the paper and got some answers to my questions:
|
Beta Was this translation helpful? Give feedback.
Hello,
Glad you find this useful! Just curious, is the radar data openly available? Would love to have more training datasets other than the US one in the paper.
As for the training of the model, yeah, my understanding of the paper is that it is a tensor of 512, width, height as the target. It is quite a sparse representation, similar to the very sparse temporal representation used as well. One way is to subsample the data so that the rare, high-rainfall data is shown more often, another paper, the Deep Generative Model of Radar (DGMR) dataset does that and the dataset is public, I'm currently mirroring it on HuggingFace as well, and they sampled the rare events more often to make sure th…