Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORDEX-CMIP6 storage requirements estimation #40

Open
jesusff opened this issue Oct 22, 2024 · 8 comments
Open

CORDEX-CMIP6 storage requirements estimation #40

jesusff opened this issue Oct 22, 2024 · 8 comments

Comments

@jesusff
Copy link
Contributor

jesusff commented Oct 22, 2024

In order to secure the archival of CMIP6-driven simulations in ESGF, we first need to estimate the volume of data that we will be producing. An accurate estimate is difficult due to the dependence of the compression rates on the different variables and even on the different models as the effective model resolution affects compression.

A very rough estimate should, in principle, be easily extracted by combining the information in:

jesusff added a commit that referenced this issue Oct 22, 2024
@jesusff
Copy link
Contributor Author

jesusff commented Oct 22, 2024

This is a first estimate, only for CORE variables and with an assumed average 60% compression rate wrt to raw binary float precision. This value can be adjusted (e.g. by comparing with real processed output).

ic| simulation_count: experiment  evaluation  historical  ssp119  ssp126  ssp245  ssp370  ssp585
                      domain
                      AFR-25               0           0       0       0       0       0       0
                      ANT-12               3           4       0       0       0       4       0
                      ARC-12               4           6       0       0       0       6       0
                      AUS-20i              4          34       0      34      20      34       4
                      CAM-12               1           8       0       8       8       0       8
                      CAS-12               0           2       0       2       0       2       2
                      EAS-25               0           5       0       5       5       5       5
                      EUR-12              19          59       9      58      18      55      21
                      MED-12              10           9       0       4       1       6       3
                      MED-25               1           1       0       0       0       0       0
                      MENA-25              1           1       0       1       1       1       1
                      NAM-12               1           8       0       4       4       8       1
                      NAM-25               1          15       0       0       0      15       0
                      SAM-25               1           1       0       0       0       1       0
                      SEA-12               1           3       0       0       0       3       0
                      SEA-25               3          13      10      13       3      13       3
                      WAS-25               2           5       0       1       4       2       4
ic| variable_count: priority   CORE  TIER1  TIER2
                    frequency
                    1hr          13     30      7
                    6hr           0     71     51
                    day          15    105     63
                    fx            2      0      7
                    mon          15    105     64
ic| variable_records_per_yr: priority     CORE   TIER1  TIER2
                             frequency
                             1hr        113880  262800  61320
                             6hr             0  103660  74460
                             day          5475   38325  22995
                             fx              0       0      0
                             mon           180    1260    768
ic| size_TB: experiment  evaluation  historical  ssp119  ssp126  ssp245  ssp370  ssp585
             domain
             AFR-25             0.0         0.0     0.0     0.0     0.0     0.0     0.0
             ANT-12             6.8        12.0     0.0     0.0     0.0    19.1     0.0
             ARC-12            11.6        22.9     0.0     0.0     0.0    36.5     0.0
             AUS-20i            4.9        54.4     0.0    86.6    50.9    86.6    10.2
             CAM-12             4.5        47.1     0.0    74.9    74.9     0.0    74.9
             CAS-12             0.0         7.6     0.0    12.1     0.0    12.1    12.1
             EAS-25             0.0         7.7     0.0    12.3    12.3    12.3    12.3
             EUR-12            39.0       159.7    38.8   250.0    77.6   237.0    90.5
             MED-12            11.6        13.8     0.0     9.7     2.4    14.6     7.3
             MED-25             0.3         0.4     0.0     0.0     0.0     0.0     0.0
             MENA-25            1.3         1.7     0.0     2.7     2.7     2.7     2.7
             NAM-12             3.8        40.0     0.0    31.8    31.8    63.6     8.0
             NAM-25             0.9        18.7     0.0     0.0     0.0    29.8     0.0
             SAM-25             1.1         1.5     0.0     0.0     0.0     2.4     0.0
             SEA-12             2.4         9.5     0.0     0.0     0.0    15.2     0.0
             SEA-25             1.8        10.3    12.6    16.4     3.8    16.4     3.8
             WAS-25             2.4         7.8     0.0     2.5     9.9     5.0     9.9
ic| size_TB.T.sum(): domain
                     AFR-25       0.0
                     ANT-12      37.9
                     ARC-12      71.0
                     AUS-20i    293.6
                     CAM-12     276.3
                     CAS-12      43.9
                     EAS-25      56.9
                     EUR-12     892.6
                     MED-12      59.4
                     MED-25       0.7
                     MENA-25     13.8
                     NAM-12     179.0
                     NAM-25      49.4
                     SAM-25       5.0
                     SEA-12      27.1
                     SEA-25      65.1
                     WAS-25      37.5
                     dtype: float64
/!\ Considering just ['CORE'] vars.)
Total CORDEX-CMIP6 estimated size is: 2109 TB

@gnikulin
Copy link
Contributor

I think such an approach should work as an basic estimate. RCMs with non rotated grids usually provide larger domains but some RCM groups may provide only a subset of the Tier1 and 2 variables.

Are all simulations for EUR-12 only 892.6 Tb ?

@jesusff
Copy link
Contributor Author

jesusff commented Oct 29, 2024

Yes, but this is only CORE variables and only those simulations planned so far in this repo. I paste here the summary including all tiers:

ic| simulation_count: experiment  evaluation  historical  ssp119  ssp126  ssp245  ssp370  ssp585
                      domain                                                                    
                      AFR-25               0           0       0       0       0       0       0
                      ANT-12               3           4       0       0       0       4       0
                      ARC-12               4           6       0       0       0       6       0
                      AUS-20i              4          34       0      34      20      34       4
                      CAM-12               1           8       0       8       8       0       8
                      CAS-12               0           2       0       2       0       2       2
                      EAS-25               0           5       0       5       5       5       5
                      EUR-12              19          59       9      58      18      55      21
                      MED-12              10           9       0       4       1       6       3
                      MED-25               1           1       0       0       0       0       0
                      MENA-25              1           1       0       1       1       1       1
                      NAM-12               1           8       0       4       4       8       1
                      NAM-25               1          15       0       0       0      15       0
                      SAM-25               1           1       0       0       0       1       0
                      SEA-12               1           3       0       0       0       3       0
                      SEA-25               3          13      10      13       3      13       3
                      WAS-25               2           5       0       1       4       2       4
ic| variable_count: priority   CORE  TIER1  TIER2
                    frequency                    
                    1hr          13     30      7
                    6hr           0     71     51
                    day          15    105     63
                    fx            2      0      7
                    mon          15    105     64
ic| variable_records_per_yr: priority     CORE   TIER1  TIER2
                             frequency                       
                             1hr        113880  262800  61320
                             6hr             0  103660  74460
                             day          5475   38325  22995
                             fx              0       0      0
                             mon           180    1260    768

/!\ Considering just ['CORE', 'TIER1', 'TIER2'] vars.)

ic| size_TB: experiment  evaluation  historical  ssp119  ssp126  ssp245  ssp370  ssp585
             domain                                                                    
             AFR-25             0.0         0.0     0.0     0.0     0.0     0.0     0.0
             ANT-12            39.2        68.9     0.0     0.0     0.0   109.7     0.0
             ARC-12            66.6       131.5     0.0     0.0     0.0   209.4     0.0
             AUS-20i           27.8       311.6     0.0   496.2   291.9   496.2    58.4
             CAM-12            25.6       269.7     0.0   429.5   429.5     0.0   429.5
             CAS-12             0.0        43.5     0.0    69.2     0.0    69.2    69.2
             EAS-25             0.0        44.1     0.0    70.3    70.3    70.3    70.3
             EUR-12           223.8       915.1   222.3  1432.7   444.6  1358.6   518.8
             MED-12            66.6        78.9     0.0    55.9    14.0    83.8    41.9
             MED-25             1.7         2.2     0.0     0.0     0.0     0.0     0.0
             MENA-25            7.4         9.7     0.0    15.5    15.5    15.5    15.5
             NAM-12            21.7       229.0     0.0   182.4   182.4   364.7    45.6
             NAM-25             5.4       107.3     0.0     0.0     0.0   171.0     0.0
             SAM-25             6.6         8.7     0.0     0.0     0.0    13.8     0.0
             SEA-12            13.8        54.6     0.0     0.0     0.0    86.9     0.0
             SEA-25            10.4        59.1    72.4    94.2    21.7    94.2    21.7
             WAS-25            13.5        44.6     0.0    14.2    56.8    28.4    56.8
ic| size_TB.T.sum(): domain
                     AFR-25        0.0
                     ANT-12      217.8
                     ARC-12      407.5
                     AUS-20i    1682.1
                     CAM-12     1583.8
                     CAS-12      251.1
                     EAS-25      325.3
                     EUR-12     5115.9
                     MED-12      341.1
                     MED-25        3.9
                     MENA-25      79.1
                     NAM-12     1025.8
                     NAM-25      283.7
                     SAM-25       29.1
                     SEA-12      155.3
                     SEA-25      373.7
                     WAS-25      214.3
                     dtype: float64

Total CORDEX-CMIP6 estimated size is: 12090 TB

@gnikulin
Copy link
Contributor

Now it looks reasonable :-) as we need an estimate for all CORE, Tier1 and 2 variables.

@gnikulin
Copy link
Contributor

Although I don't expect many variables from Tier2.

@jesusff
Copy link
Contributor Author

jesusff commented Oct 29, 2024

Ok, this is the remaining one 😉 : CORE + Tier1

ic| simulation_count: experiment  evaluation  historical  ssp119  ssp126  ssp245  ssp370  ssp585
                      domain                                                                    
                      AFR-25               0           0       0       0       0       0       0
                      ANT-12               3           4       0       0       0       4       0
                      ARC-12               4           6       0       0       0       6       0
                      AUS-20i              4          34       0      34      20      34       4
                      CAM-12               1           8       0       8       8       0       8
                      CAS-12               0           2       0       2       0       2       2
                      EAS-25               0           5       0       5       5       5       5
                      EUR-12              19          59       9      58      18      55      21
                      MED-12              10           9       0       4       1       6       3
                      MED-25               1           1       0       0       0       0       0
                      MENA-25              1           1       0       1       1       1       1
                      NAM-12               1           8       0       4       4       8       1
                      NAM-25               1          15       0       0       0      15       0
                      SAM-25               1           1       0       0       0       1       0
                      SEA-12               1           3       0       0       0       3       0
                      SEA-25               3          13      10      13       3      13       3
                      WAS-25               2           5       0       1       4       2       4
ic| variable_count: priority   CORE  TIER1  TIER2
                    frequency                    
                    1hr          13     30      7
                    6hr           0     71     51
                    day          15    105     63
                    fx            2      0      7
                    mon          15    105     64
ic| variable_records_per_yr: priority     CORE   TIER1  TIER2
                             frequency                       
                             1hr        113880  262800  61320
                             6hr             0  103660  74460
                             day          5475   38325  22995
                             fx              0       0      0
                             mon           180    1260    768

/!\ Considering just ['CORE', 'TIER1'] vars.

ic| size_TB: experiment  evaluation  historical  ssp119  ssp126  ssp245  ssp370  ssp585
             domain                                                                    
             AFR-25             0.0         0.0     0.0     0.0     0.0     0.0     0.0
             ANT-12            30.1        52.9     0.0     0.0     0.0    84.2     0.0
             ARC-12            51.1       100.9     0.0     0.0     0.0   160.7     0.0
             AUS-20i           21.3       239.0     0.0   380.6   223.9   380.6    44.8
             CAM-12            19.6       206.9     0.0   329.5   329.5     0.0   329.5
             CAS-12             0.0        33.3     0.0    53.1     0.0    53.1    53.1
             EAS-25             0.0        33.9     0.0    53.9    53.9    53.9    53.9
             EUR-12           171.7       702.0   170.6  1099.1   341.1  1042.3   398.0
             MED-12            51.1        60.6     0.0    42.9    10.7    64.3    32.1
             MED-25             1.3         1.7     0.0     0.0     0.0     0.0     0.0
             MENA-25            5.7         7.5     0.0    11.9    11.9    11.9    11.9
             NAM-12            16.7       175.7     0.0   139.9   139.9   279.8    35.0
             NAM-25             4.2        82.4     0.0     0.0     0.0   131.2     0.0
             SAM-25             5.0         6.6     0.0     0.0     0.0    10.6     0.0
             SEA-12            10.6        41.9     0.0     0.0     0.0    66.7     0.0
             SEA-25             7.9        45.4    55.6    72.2    16.7    72.2    16.7
             WAS-25            10.4        34.2     0.0    10.9    43.5    21.8    43.5
ic| size_TB.T.sum(): domain
                     AFR-25        0.0
                     ANT-12      167.2
                     ARC-12      312.7
                     AUS-20i    1290.2
                     CAM-12     1215.0
                     CAS-12      192.6
                     EAS-25      249.5
                     EUR-12     3924.8
                     MED-12      261.7
                     MED-25        3.0
                     MENA-25      60.8
                     NAM-12      787.0
                     NAM-25      217.8
                     SAM-25       22.2
                     SEA-12      119.2
                     SEA-25      286.7
                     WAS-25      164.3
                     dtype: float64
Total CORDEX-CMIP6 estimated size is: 9275 TB

@jesusff
Copy link
Contributor Author

jesusff commented Oct 29, 2024

Just for the record: some reasonable approximations are in place until all domains are defined in https://github.com/WCRP-CORDEX/domain-tables/blob/main/CORDEX-CMIP5_rotated_grids.csv

# Some fixes for missing domains
ngridcells['AUS-20i'] = ngridcells['AUS-25'] #!!
ngridcells['MENA-25'] = ngridcells['MNA-25']
ngridcells['MED-25'] = ngridcells['MED-12']/4
ngridcells['SEA-12'] = ngridcells['SEA-25']*4

@gnikulin
Copy link
Contributor

I think 10 PB sounds like an reasonable estimate. In any case, it's impossible to provide exact size :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants