Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order nwp channels #326

Open
peterdudfield opened this issue Jun 7, 2024 · 7 comments
Open

Order nwp channels #326

peterdudfield opened this issue Jun 7, 2024 · 7 comments

Comments

@peterdudfield
Copy link
Contributor

Detailed Description

Should we add a pipeline that orders the nwp channels alphabetically

Context

  • the ml models do carry about what order the channels comes in
  • Currently the model might break if the channels are different order. For example if the order was different from what it was trained on

Possible Implementation

  • add bool to config
  • add one data pipe that does the ordering
  • add into data pipeline
  • (add test)
@AUdaltsova
Copy link
Contributor

I looked at where channel selection happens and I think this can be achieved via a one-liner in ocf_datapipes/select/filter_channels.py (just sort the channel list before performing selection, it should return it in the right order after that, including coord reordering)

@peterdudfield
Copy link
Contributor Author

Thats a good idea

@peterdudfield
Copy link
Contributor Author

It would be interested to know if if sel just selected the channels, or it selects them and orders them https://github.com/openclimatefix/ocf_datapipes/blob/main/ocf_datapipes/select/filter_channels.py#L49

by trying d.sel({"variable": ["lcc", "mcc"]}) and d.sel({"variable": ["mcc", "lcc"]}) we do seem to get different results

@AUdaltsova
Copy link
Contributor

Yes, that's what I was basing the one-liner suggestion on: sel seems to reorder coordinates, so the order of channels depends on the order in which somebody adds them into the config, and hence is very prone to inconsistencies.

I am trying to see if I can find a solid description of the reordering somewhere in the docs

@dfulu
Copy link
Member

dfulu commented Jun 12, 2024

I'm not sure I follow what the issue is here.

If using the filter channels function which we use for example here then we will have the same channel ordering in training and production, even if the dataset we are selecting from has them in a different order on disk. So long as the input data config remains the same at training and production. I think this is the desired behaviour?

@dfulu
Copy link
Member

dfulu commented Jun 12, 2024

It would be interested to know if if sel just selected the channels, or it selects them and orders them https://github.com/openclimatefix/ocf_datapipes/blob/main/ocf_datapipes/select/filter_channels.py#L49

by trying d.sel({"variable": ["lcc", "mcc"]}) and d.sel({"variable": ["mcc", "lcc"]}) we do seem to get different results

Yes, this is how the selection works. It reorders the channels based on list. The first one has the channels in order lcc then mcc. The second has them in the order mcc then lcc

@AUdaltsova
Copy link
Contributor

AUdaltsova commented Jun 12, 2024

Yeah as long as the same config file is used it should be completely fine, I wasn't sure if that's what's happening

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants