Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NWM raster data in Earth Engine #17

Open
KMarkert opened this issue Oct 21, 2022 · 4 comments
Open

NWM raster data in Earth Engine #17

KMarkert opened this issue Oct 21, 2022 · 4 comments

Comments

@KMarkert
Copy link

Starting a separate conversation from #15 regarding loading and using the terrain, forcing, and land data with Earth Engine. @jameshalgren @danames @ma-sada

This is very doable and much more constrained. We will need a data transformation to go from the nc file format to an intermediate geotiff format, then ingest to Earth Engine using the CLI

Other LSM data has been uploaded to Earth Engine (eg GLDAS) so we can model how the data looks on EE after those.

@KMarkert
Copy link
Author

KMarkert commented Oct 21, 2022

Do we have an estimate on total data volume for each of the different collections or at least expected range? And do we have an expected increase in volume with time (per day, month, or year) to plan from the EE side?

Also, does terrain change in time or is this static for each run?

@imanmaghami
Copy link

imanmaghami commented Oct 24, 2022

@KMarkert I did a quick ballpark calculation for the size of terrain, forcing, and land based on the day 20221023. You can find the details in this spreadsheet. I hope it helps answering the first part of your question! In short, the total size of these three collections is ~920 GB (say 1 TB) per day. The individual sizes for a day worth of data for terrain, forcing, and land are 90, and 440, 390 GB, respectively.

@jameshalgren
Copy link
Contributor

jameshalgren commented Oct 24, 2022

Also, does terrain change in time or is this static for each run?

terrain is the result of passing the estimated precipitation excess (currently computed by the NOAH-MP model) through an overland routing scheme -- it's a rudimentary 2-d routing model with a lot of similarity to an old code CASC2D or GSSHA, based on the same thing in the "explicit" formulation.

So, it is dynamic -- in its present form, it essentially represents the current inundation on the land surface. (There are a number of reasons why it should not be relied on for that at the moment).

@KMarkert
Copy link
Author

Understood. @jameshalgren thank you for clarifying the terrain file and thank you @imanmaghami for the estimates on storage.

Given that the NWM is very large (ballpark >1PB), we should not ingest directly onto EE as assets since it is a managed database and with the growth rate for this collection it will be logistically challenging because we would need to continually request limit increases as it grows. I suggest we use Cloud Optimized Geotiff backed assets with EE. This allows us to store the data on GCS and grow the collection which EE points to without having to worry about storage limits. There is some over head associated with reading the data from GCS and we will have to manage another collection of files but it will be best from a sustainability perspective. Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants