Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Sounding observation counts discrepancy between JEDI and GSI #233

Open
delippi opened this issue Nov 22, 2024 · 4 comments
Open

[bug] Sounding observation counts discrepancy between JEDI and GSI #233

delippi opened this issue Nov 22, 2024 · 4 comments
Assignees

Comments

@delippi
Copy link
Collaborator

delippi commented Nov 22, 2024

Current behavior (describe the bug)

When processing ADPUPA (120/220) sounding data in JEDI, the observation counts appear significantly lower compared to GSI. Specifically, in the RRFS FV3-JEDI ctest case (2022-05-26T19:00:00Z):

  • GSI observation count: 236
  • JEDI observation count: 96

With all QC filters turned off, the JEDI log has the following information:

 0: QC apdupa_airTemperature_120 airTemperature: 247 missing values.
 0: QC apdupa_airTemperature_120 airTemperature: 96 passed out of 343 observations.

We are not sure if this is a problem during the bufr2ioda conversion, in the JEDI configuration, or in JEDI itself.

Steps to Reproduce (if applicable)

What computer are you running on?

Hera

Steps to reproduce the behavior

  1. Copy phase 2 workspace: /scratch2/NCEPDEV/fv3-cam/Donald.E.Lippi/RRFSv2/jedi-assim-phase2. The only parts needed are as follows
  • run_all.sh
  • rrfs-data_fv3jedi_2022052619/
  • gsi_2022052619/
  1. Run run_all.sh
  • RUN_GSI, RUN_JEDI, MAKE_PLOT, use_offline_domain_check should all be "YES". After running GSI the first time set RUN_GSI="NO".
  • change paths for jedi_dir, gsi_dir to where ever you copied the rrfs and gsi case directories from step 1.
  • obtype_configs="$obtype_configs adpupa_airTemperature_120.yaml" should be the only uncommented obstype_config
  • Other paths should be okay.
  • gsi_2022052619/run_gsi.sh will use my GSI build (unless changed)
  • rrfs-data_fv3jedi_2022052619/run_fv3jedi.sh will use my JEDI build (unless changed)
  • bash ./run_all.sh
  • check the ./rrfs-data_fv3jedi_2022052619/conv.yaml (template yamls are in the valid_yamls path in ./run_all.sh)

Expected behavior

Observation counts between JEDI and GSI should be more similar.

Suggested Solution (if known)

Unknown at this point.

Acceptance Criteria (Definition of Done)

  • Link any relevant pull requests here:
    • PR # (will be added when a solution is found)

Dependencies

RDASApp Issue #232
HDASApp Issue NOAA-EMC/HDASApp#16

Additional information (optional)

IODA re-processing data is done: /scratch2/NCEPDEV/fv3-cam/Donald.E.Lippi/RRFSv2/ioda_processing
Relevant files include:

# The full obs file:
./ioda/ioda_adpupa.nc

# The offline domain check (dc) file:
./ioda/ioda_adpupa_dc.png
./ioda/ioda_adpupa_dc.nc

# yaml used for converter
./yaml/prepbufr_adpupa.yaml
@delippi
Copy link
Collaborator Author

delippi commented Nov 25, 2024

I think I've found the problem. GSI uses nhr_assimilation=3, nhr_obsbin=3 assuming one observation bin spanning the entire 3-h period. In my JEDI DA yaml I have a shorter time window:

cost function:
  cost type: 3D-Var
  time window:
      begin: 2022-05-26T18:00:00Z
      length: PT2H

I believe that JEDI is tossing any observation with dateTime (not using timeOffset) that is outside the shorter JEDI window. I tested this hypothesis by changing all the dateTime values to be equal to the analysis time (in this case dateTime=1653591600). I now get an observation count of 188 obs. This matches my expected observation counts which I calculated by reading in the IODA observation and filtering out all observations that don't match the following criteria:

  1. ObsType==120
  2. airTemperature values were not invalid or already masked

I also checked this by add counting the number of observations with a dateTime corresponding to a timeOffset=-3600 or greater. The ob count was 96 matching the counts I was seeing to start with.

@ShunLiu-NOAA @TingLei-NOAA @SamuelDegelia-NOAA @guoqing-noaa @JingCheng-NOAA @hu5970 We should discuss what the correct time window settings should be for JEDI system.

@ShunLiu-NOAA
Copy link

@delippi Thanks for this finding. With a proper dateTime or other YAML configurations, is it possible that JEDI ingests the same amount of observations as GSI?

@delippi
Copy link
Collaborator Author

delippi commented Nov 25, 2024

@ShunLiu-NOAA, I'm looking into this. I think there is still something else that I'm missing. I still expect them be able to get the exact ob counts... at least I don't see why they shouldn't!

@delippi
Copy link
Collaborator Author

delippi commented Nov 25, 2024

@ShunLiu-NOAA, I think we can get the same ob counts. I've just done a test where I change the convinfo time window value to 0.5 (instead of 1.5) and adjust the YAML time window filter to do:

         # Time window filter
         - filter: Domain Check
           apply at iterations: 0,1
           where:
             - variable:
                 name: MetaData/timeOffset # units: s
               minvalue: -1800
               maxvalue:  1800

I was able to get an exact match (36 obs). I was able to get an exact match when using convinfo time window = 0.9 and +/-3240 in JEDI (82 obs).

I'm not sure why I get mismatching results again when I change to using convinfo time window = 1.0 and +/-3600 in JEDI... (96 vs 236 obs). Based on the previous results, 96 seems like the correct obs when using a time window of 1 hour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants