Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct grid information in netCDF output #8

Open
leifdenby opened this issue Jul 26, 2018 · 5 comments
Open

Correct grid information in netCDF output #8

leifdenby opened this issue Jul 26, 2018 · 5 comments

Comments

@leifdenby
Copy link
Collaborator

leifdenby commented Jul 26, 2018

Currently MONC's output files doesn't contain grid-position information, i.e. positions for the spatial coordinates (x, y, etc). This is required for CF-compliancy. Also currently variables which aren't colocated (for example the velocity components) are defined on the same grid in the output files, e.g.

netcdf diagnostics_ts_25.0 {
dimensions:
        x = 66 ;
        y = 66 ;
        z = 76 ;
variables:
        ...
        double w(time_series_600_1800.0, x, y, z) ;
        double u(time_series_600_1800.0, x, y, z) ;
}

To remedy this grid-information (staggering and positions) for each scalar field should be communicated to the MONC IO server and coordinates for each scalar field need to be written to the output file.

My suggestions:

  1. I think the variable size in the MONC io XML config files should be renamed to grid instead, to make it explicit that this is about setting both the size and position of the grid on which variables are defined.
  2. I would add a grid=“auto" option, in this case the IO server would expect grid information from MONC itself, and fail if it isn’t received.
  3. Grid-Information, when using the Cartesian grid, could be communicated through a 6-bit integer as binary encoding of which dimensions and used and whether staggered on centered values are used for each. E.g. 110010 might mean “use x and y dimensions” (xyz encoded as the first three bits, 110) and “use centered grid in the x-direction and staggered in y (encoded as 010, i.e. the last bit, for z, would be ignored). This could be communicated through the same MPI-datatype that I extended for the field meta information, data_sizing_description_type. This information would be used when grid=“auto” in the XML config file. And variables with the positions for the xn,yn,zn,x,y,z grid positions would automatically be written to every NetCDF file that MONC creates. This might not be the best approach but I think that data_sizing_description_type.dim_sizes is inadequate as it stands because it doesn’t communicate whether x,y or z is used and information about staggering variables.
  4. Grid-information, for variables on non-Cartesian grid. I would suggest we survey what people need here. My gut feeling is that simply supporting 1D arrays with position information (in time or space…) might suffice, I expect people are wanting to extract time series.
@leifdenby
Copy link
Collaborator Author

leifdenby commented Jul 26, 2018

For my own future reference here's how I would define the grid for individual variables

module grid_definition
   implicit none
   private

   integer, parameter :: USE_X = 32        ! 100 000
   integer, parameter :: USE_Y = 16        ! 010 000
   integer, parameter :: USE_Z = 8         ! 001 000
   
   integer, parameter :: X_STAGGERED = 4   ! 000 100
   integer, parameter :: Y_STAGGERED = 2   ! 000 010
   integer, parameter :: Z_STAGGERED = 1   ! 000 001
   
   integer, parameter :: X_GRID_STAGGERED = USE_X + X_STAGGERED
   integer, parameter :: Y_GRID_STAGGERED = USE_Y + Y_STAGGERED
   integer, parameter :: Z_GRID_STAGGERED = USE_Z + Z_STAGGERED
   integer, parameter :: X_GRID_CENTERED = USE_X
   integer, parameter :: Y_GRID_CENTERED = USE_Y
   integer, parameter :: Z_GRID_CENTERED = USE_Z

   public X_GRID_STAGGERED, X_GRID_CENTERED
   public Y_GRID_STAGGERED, Y_GRID_CENTERED
   public Z_GRID_STAGGERED, Z_GRID_CENTERED
end module

program test
    use grid_definition
    
    integer var2d_grid;
    
    ! define the grid to be used for a variable which is defined in 2D
    var2d_grid = X_GRID_CENTERED + Y_GRID_STAGGERED
end program test

@stevenleeds
Copy link
Collaborator

I was just thinking, why not simply use booleans?

@leifdenby
Copy link
Collaborator Author

I was just thinking, why not simply use booleans?

It's just easier to create the data structure to store and send one integer instead of six booleans I think :) But maybe you've thought of a better way of doing it. How would you do it?

@stevenleeds
Copy link
Collaborator

I was just thinking that compared to the total amount of metainformation, even six integers (0/1) would be small (but maybe smaller data formats are possible, like booleans).
https://www.unidata.ucar.edu/software/netcdf/netcdf/netCDF-external-data-types.html
The main advantage of using different variables rather than a single integer is that it keeps the grid information explicit. I tend to prefer ease of use over a small efficiency gain.

@leifdenby
Copy link
Collaborator Author

The main advantage of using different variables rather than a single integer is that it keeps the grid information explicit. I tend to prefer ease of use over a small efficiency gain.

Yes :) What I'm describing above is not what to store in the netCDF file, that would have to be CF-compliant (and so have different variables for each coordinate), but instead how to communicate from MONC worker to MONC IO server what the grid is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants