-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial design of mllam-verification package #1
Comments
This is cool! I hadn't considered using a config file as the interface to this, but what you suggest sounds good. I guess there will be some assumptions about the coordinates present in the input datasets (initial, target and prediction as you have them). Two quick thoughts on this:
It might be simpler to think of the function signature for the two plots you are going to create. For the global persistence something like the following might work: def create_global_persistence_timeline_plot(
ds_reference: xr.Dataset,
prediction: Union[xr.Dataset,Dict[str,xr.Dataset]],
) -> matplotlib.pyplot.Figure: or you could instead rethink this as a global timeline plot (where including a line for persistence could be an option): def create_global_error_timeline_plot(
ds_reference: xr.Dataset,
prediction: Union[xr.Dataset,Dict[str,xr.Dataset]],
include_persistence=True
) -> matplotlib.pyplot.Figure: The idea with allowing the argument to be a A last idea would be to separate the plotting of these validation plots from the calculation of the thing to plot. That would mean to return and def calculate_global_error(
ds_reference: xr.Dataset,
prediction: Union[xr.Dataset,Dict[str,xr.Dataset]],
include_persistence=True
) -> xr.Dataset:
ds_prediction_1 = ...
ds_prediction_2 = ...
ds_reference = ...
ds_error = calculate_global_error(ds_reference, prediction=dict(model1=ds_prediction_1, model2=ds_prediction_2), include_persistence=True)
# ds_error would contain an extra dimension called say "model_name" with values `[model1, model2, persistence]`
ds_error.t2m.plot(hue="model_name", x="elapsed_time") I am not sure about how to composite these functions. But in my experience it is useful to 1) return the values (typically data-arrays) from calculations in case you want to save these values to file, 2) think about what assumptions you will make about the inputs and 3) how you'd like to use the functions you write (in for example a Jupyter notebook). Just some ideas :) |
Thanks for all the good ideas/thoughts!
I think that would make sense. I actually included the "initial" dataset mostly because you wrote it in your confluence notes. I thought, that the "initial" dataset would be identical (up to an interpolation) to the "target" dataset at time zero. And then, a comparison between the initial and the target at time zero wouldn't make much sense. Or do we actually perturb the initial conditions? There is probably something I've missed with the way neural-lam runs.
I think we should allow for multiple "prediction" datasets as you suggest below. Then I see two ways you can use the tool to compare a set of experiments: a. Compare the multiple predictions to the same reference dataset I think in both cases, you should adjust the config to set what the reference dataset is and what the prediction datasets are, and then run the tool again.
Make sense. I will adjust the pydantic config validation to allow for multiple prediction datasets.
Yes, I think that would be the best solution. Then we'll have some general plotting functions, which can plot datasets output by various calculation functions.
Makes sense. I will talk with @elbdmi to get to know what I can expect the inference dataset to look like and define some assumptions based on that. |
Sounds good! To make the code easier for you to write here it might be best to simply make assumptions about the "reference" and "prediction" datasets you will be using. For the global time-error plot we could assume that:
The reason why I suggest For the spatial error plot for a given elapsed time we could assume that:
For the spatial plot we need to have the I think you've got the right idea to work out what the inputs should contain. I think I would define that in terms of the functions first (since then we can build on that in notebooks and separate functions that actually loads for disk) and then as @elbdmi keeps working we can constrain what is stored in the zarr datasets too. It might be simpler to only allow calculations for a single prediction dataset for now. Then your code could be called to write a number of netCDF files which contain the output of the error calculations and then plotting could be done by combining multiple of these netCDF files. So this means dropping the idea with the Anyway, I think the key is to get the functions that compute the error data as xr.Datasets done and write some tests and make some plots with that. Looking forward to seeing what you and @elbdmi work out! |
For the ds_prediction I have a state variable with coordinates [analysis_time, elapsed_forecast_duration, grid_index, state_feature] but I could remove the state_feature |
Sounds good. I will start working on it this afternoon.
To be sure I understand you correctly, do you mean that we always should calculate the error for all variables, or is it fine to keep the inputs
...
variables:
- 2t
- 10u part? I imagine, that if you don't specify the "variables" section, we calculate the error for all variables, but if you specify certain variables, we only calculate the error for those variables. |
What will the state_feature contain? |
State is the main variable containing predictions with dimensions [analysis_time, elapsed_forecast_duration, grid_index, state_feature]. The state_feature contains the specific variables or physical quantities that the model predicts for example 2t or 10u. |
Okay, so then in order for mllam-verification to be able to only calculate error of specific physical quantities (if we want to be able to do this), I guess we should keep the state_feature:) |
Yes, sort of :) To do this the process would be to go from |
I was thinking that it might be good to make the functions that compute the error values work do the calculation on all variables in the dataset, but then when plotting your idea of picking the ones out that you'd like could be separate. If everything is loaded with dask (which it will be when using |
Okay, lets do that. I just thought that if we at some point get a lot of variables out from the inference, we would maybe not necessarily be interested in calculating and storing the error (or other verification metrics) for alle variables. But that's maybe a problem for the future (if a problem at all). |
Would it then make sense to add a state_feature coordinate to the error datasets too? Like
|
It could be yes, but I was thinking we would try and make this codebase agnostic to whether one is using a machine learning model or not. Or said another way: If we make this tool in a way that it assumes that the inputs are (as close as possible to) CF-compliant in their contents (i.e. different physical fields in separate variables, with |
@leifdenby, that sounds great! I’d love to sit down and go through this together next week. Reversing the transformations to split the state_feature into separate variables with attributes like units and long_name makes complete sense, and your branch seems like a good starting point for this work. |
Concerning the spatial error plot would you prefer that we just save one plot per elapsed time, an animation/gif, a Hovmöller diagram or something else? |
I've now started the development of the inference vs persistence plots. I thought I would just provide my initial ideas of the design of the package, just to be aligned on these ideas before coding too much. Please provide any comments/thoughts/feedback!:)
Some thoughts related to this structure:
The text was updated successfully, but these errors were encountered: