Update to generate DESI-Y3 Lya mocks #581

HiramHerrera · 2024-07-26T01:09:12Z

This Pull Request includes updates in py/desisim/survey_release.py and the py/desisim/scripts/gen_qso_catalog.py script in order to be able to produce DESI-Y3 mocks.

Mayor updates:

survey_release.py does not require HEALpix Pixel map of the number of tiles covering a region (NPASS) to function anymore. Instead it is internally generated based on a tiles-<release>.fits file within the DESI_SPECTRO_REDUX directory by default. This tile file might be an input if a custom footprint is desired.
survey_release.py now includes the option to select mock objects in order to mimicking the characteristics of non-qso targets included in the observed data catalog.

Minor updates:

gen_qso_catalog.py: changed --input-data flag to None by default instead of the iron release observed Lya catalog.
gen_qso_catalog.py: default release is now "jura".
gen_qso_catalog.py: addition of two flags: --include-nonqso-targets to include nonqso targets and --tiles-file to use a custom footprint tiles.

Example to make a jura mock seed catalog:

gen_qso_catalog -i <input_master_catalog> --input-data <input_data_catalog> --seed <seed> --release jura --include-nonqso-targets -o <output_file_path>

In the above example:

<input_data_catalog> was set to: /global/cfs/cdirs/desi/users/martini/bal-catalogs/jura/QSO_cat_jura_main_dark_healpix_v1-bal.fits for the Y3 mocks.

HiramHerrera · 2024-07-26T01:10:12Z

@andreicuceu could review this too? I couldn't assign you as reviewer.

coveralls · 2024-07-26T01:18:03Z

coverage: 44.333% (-0.3%) from 44.679%
when pulling 69ecffa on HiramHerrera:y3mocks
into 038252e on desihub:main.

andreufont · 2024-07-26T11:16:35Z

py/desisim/scripts/gen_qso_catalog.py

@HiramHerrera - I'm not super familiar with this script, but the changes look good.

In a couple of years from now, when we want to make Y5 mocks, it will be confusing to have release="Y5" to refer to the forecasted final DESI instead of referring to the actual Y5 release, although I guess we could use either DR3 or whatever mountain is used to generate the Y5 release

andreufont · 2024-07-26T11:21:08Z

py/desisim/survey_release.py

I'm not familiar with this script either, and the changes are quite long. Is there any particular part of the code that you'd like me to take a look at? Otherwise I'll let others review it.

andreicuceu · 2024-08-04T23:41:39Z

Hi @HiramHerrera, I've been trying to test this branch, but I ran into an issue: desi_zcatalog doesn't appear to work anymore for mocks because the --prefix option has been removed. I know this has to do with a different repository, but it affects us because it was part of the standard mock-making process. So I think we should address that problem in some way in this pull-request. My proposal would be to modify the QSO catalog generated by the gen_qso_catalog script here so picca can directly use it. This would also allow us to skip the extra step of re-generating the QSO catalog after running QQ. Is there any downside to this?

I had a look at the seed catalog generated with gen_qso_catalog and the one that used to be generated with desi_zcatalog, and as far as I can tell they have exactly the same quasars. Also, I think the only missing column that picca and QSOnic need is TARGETID. Strangely this appears in the Y1 seed catalogs, but not in the Y3 ones. So the solution may be as easy as making a copy of the current MOCKID column that is named TARGETID so picca can directly use this catalog.

It would also be useful to name the hdu ZCATALOG instead of the current CATALOG. Even though picca will accept both, it looks like passing anything different from ZCATALOG will raise an error in the future, and in QSOnic this already raises an error.

HiramHerrera · 2024-08-05T17:11:56Z

Hi @andreicuceu

In principle what you request should be doable. However, I'm concerned about the redshift in the catalog.

The catalog generated by desi_zcatalog and the seed catalog generated prior running QQ would slightly differ in redshift since QQ adds some random shifts: FoG effect (400 km/s by default), systematic (0 by default) and redshift errors (0 by default) which would be contained in the redshift column (Z) generated by desi_zcatalog. On the other hand, the seed catalog generated prior running QQ would have the redshift without any shift added (Z_QSO_RSD in the Raw mocks master catalog).

This would slightly alter the resulting correlations. Apart from that I can't think of other issue for picca. Maybe @paulmartini and Ting can comment on the necessary columns/HDUs for the BAL and DLA finders to work.

andreicuceu · 2024-08-05T17:16:55Z

That's a great point @HiramHerrera. So we should not use the seed catalog for downstream analysis. In that case do you have some suggestions on how to solve the issue? It looks like the problem is that desi_zcatalog only works with redrock files now, and we have zbest files. So I guess we could either write our own script that opens the zbest files and makes the post-QQ quasar catalog (I'd be happy to give this a go), or alternatively try to modify desi_zcatalog so it works with zbest files again. What do you think?

HiramHerrera · 2024-08-05T22:09:29Z

@andreicuceu I think the best option would be to either make our own script (my personal preference) that we could add to lyatools. In this regard I think that @p-slash already has a working code (see desihub/desispec#2140)

Or simply use an older version of the desicode for making the redshift catalog only (I use 23.1).

What do you think?

alxogm · 2024-08-05T22:49:29Z

@HiramHerrera @andreicuceu adding to the question of whether making desi_zcatalog compatible again or having our own function, I wonder what people is currently using to generate truth DLA and BAL full catalogs? I remember having a script for that, which I shared with various people, including Hiram quite likely. So, I would say that the same script should generate the mock redshift catalog, the DLA and BAL catalog at the same time, and we should host it either in lyatools, or in desisim.

On the other, I think if someone have the time to bring back the desi_zcatalog functionality would be good and include a test to make sure is not broken again in the future.

andreicuceu · 2024-08-06T00:26:58Z

Thanks for pointing me to that issue @HiramHerrera. So far I have indeed been using an older version of desi_zcatalog, but that is a hack, not a solution. So I think having our own script would make the most sense. @alxogm for DLA and BAL catalogs, I wrote separate scripts in lyatools (see this and this). So I would be happy to host a QSO catalog script in lyatools as well. I'll have a look at it over the next few days (maybe just using Naim's code).

p-slash · 2024-08-06T00:48:10Z

Apologies I didn't have the time to go through this yet. Regarding the catalog creation. Yes, I have a script:
https://github.com/p-slash/desi-y1-p1d/blob/main/src/desi_y1_p1d/qq_zcatalog.py
This generates the true QSO, DLA, and BAL catalogs. If BALs are present, the output zcat.fits will have BALs appended similarly to the data. It sounds like we have a few options. In my opinion, there's not a lot of reason to fix the desi_zcatalog script other than backward compatibility.

py/desisim/survey_release.py

p-slash · 2024-08-09T16:48:09Z

py/desisim/survey_release.py

+            if 'TSNR2_LRG' in self.data.colnames:
+                log.info('Getting effective exposure time in data catalog by 12.15*TSNR2_LRG.')
+                exptime_data = 12.15*self.data['TSNR2_LRG']
+            elif 'TSNR2_LYA' in self.data.colnames:
+                log.info('Getting effective exposure time in data catalog by 11.8*TSNR2_LYA.')
+                exptime_data = 11.8*self.data['TSNR2_LYA']
+            elif 'TSNR2_QSO' in self.data.colnames:
+                log.info('Getting effective exposure time in data catalog by 33.61*TSNR2_QSO.')
+                exptime_data = 33.61*self.data['TSNR2_QSO']
+            else: 
+                raise ValueError("Can't compute effective exposure time. Data catalog columns should include TSNR2_LRG, TSNR2_LYA or TSNR2_QSO.")


What is the assumption for self.data here? TSNR2_LRG has the highest priority, so should you raise a big warning if that is not found? Or are you imagining that the absence of TSNR2_LRG column is deliberate in order to run some tests using other columns, so a warning is not required?

I added a warning if TSNR2_LRG is not in the data catalog.

We use TSNR2_LRG as this is what the spectroscopic pipeline uses to define the effective exposure time. However I've found some catalogs without this column so I added the option to use other target templates SNR.

Maybe @julienguy can provide more insight here.

p-slash · 2024-08-09T16:53:24Z

py/desisim/survey_release.py

+from desimodel.io import load_tiles
+from desimodel.footprint import tiles2pix, is_point_in_desi, radec2pix


Small readability comment: You can import load_tiles as desimodel_load_tiles so that it is easier to understand in the code below. Not a big deal.

p-slash

I left small comments for readability and best practices. I did not review the expected behavior of the code and its accuracy etc, so I left it to pipeline tests. Better to merge after the code has been tested.
I approve it as it is if you want to go ahead and merge.

sbailey · 2024-08-13T17:53:21Z

Coming to the conversation late, but a few notes:

support for desi_zcatalog --prefix was dropped last year in #2117 . This wasn't directly purposeful, but more that I wasn't aware that it was still needed and the bookkeeping was a pain so I dropped it to keep the code less messy. If it would still be quite useful, I'd welcome a desispec update to readd that.
In general, it would be better for new mocks to follow the current data model structure of real data processing. We haven't used "zbest" for many years (dropped sometime before fuji, our first public release). That would make it easier to keep our tools in sync for mocks vs. data.

HiramHerrera · 2024-08-14T02:12:06Z

@sbailey what would be the recommended name for the redrock like files?

We used zbest as it was the baseline back then but we could take advantage of this PR to update it so our mocks follow the same format as observed data as much as posible.

I wonder if changing this name would fix the desi_zcatalog compatibility issue.

What do you think @alxogm, @julienguy?

alxogm · 2024-08-14T03:30:40Z

I wonder if changing this name would fix the desi_zcatalog compatibility issue.

Hi @HiramHerrera, it will not fix all the compatibility issues but it will remove the need of a specific flag and, will make it easier to fix the remaining ones.

I think is a good idea to use the current name for the catalogs, I believe the current name is redrock-{survey}-{program}-{healpix}.fits, so we'll need to figure out if it could be enough with something like redrock-{healpix}.fits for the mocks, or how we'll have to adapt the name...

HiramHerrera added 2 commits July 25, 2024 17:40

Updated survey_release.py for DESI-Y3 mocks

29a988b

Updated gen_qso_catalog script to Y3 mocks by default

44b9847

HiramHerrera requested review from andreufont, julienguy, alxogm and p-slash July 26, 2024 01:09

andreufont reviewed Jul 26, 2024

View reviewed changes

p-slash reviewed Aug 6, 2024

View reviewed changes

py/desisim/survey_release.py Outdated Show resolved Hide resolved

p-slash reviewed Aug 9, 2024

View reviewed changes

py/desisim/survey_release.py Outdated Show resolved Hide resolved

p-slash reviewed Aug 9, 2024

View reviewed changes

p-slash self-requested a review August 9, 2024 16:53

p-slash approved these changes Aug 9, 2024

View reviewed changes

Addressed PR desihub#581 recommendations

8340561

HiramHerrera force-pushed the y3mocks branch from 483ee55 to 2f0ff51 Compare August 14, 2024 02:03

Added option for zmin and zmax

69ecffa

HiramHerrera force-pushed the y3mocks branch from 2f0ff51 to 69ecffa Compare August 14, 2024 15:09

Small modification to allow different master catalog format

9e8d238

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to generate DESI-Y3 Lya mocks #581

Update to generate DESI-Y3 Lya mocks #581

HiramHerrera commented Jul 26, 2024

HiramHerrera commented Jul 26, 2024

coveralls commented Jul 26, 2024 •

edited

Loading

andreufont Jul 26, 2024

andreufont Jul 26, 2024

andreicuceu commented Aug 4, 2024

HiramHerrera commented Aug 5, 2024

andreicuceu commented Aug 5, 2024 •

edited

Loading

HiramHerrera commented Aug 5, 2024

alxogm commented Aug 5, 2024

andreicuceu commented Aug 6, 2024

p-slash commented Aug 6, 2024

p-slash Aug 9, 2024

HiramHerrera Aug 14, 2024

p-slash Aug 9, 2024

p-slash left a comment

sbailey commented Aug 13, 2024

HiramHerrera commented Aug 14, 2024

alxogm commented Aug 14, 2024

		from desimodel.io import load_tiles
		from desimodel.footprint import tiles2pix, is_point_in_desi, radec2pix

Update to generate DESI-Y3 Lya mocks #581

Are you sure you want to change the base?

Update to generate DESI-Y3 Lya mocks #581

Conversation

HiramHerrera commented Jul 26, 2024

HiramHerrera commented Jul 26, 2024

coveralls commented Jul 26, 2024 • edited Loading

andreufont Jul 26, 2024

Choose a reason for hiding this comment

andreufont Jul 26, 2024

Choose a reason for hiding this comment

andreicuceu commented Aug 4, 2024

HiramHerrera commented Aug 5, 2024

andreicuceu commented Aug 5, 2024 • edited Loading

HiramHerrera commented Aug 5, 2024

alxogm commented Aug 5, 2024

andreicuceu commented Aug 6, 2024

p-slash commented Aug 6, 2024

p-slash Aug 9, 2024

Choose a reason for hiding this comment

HiramHerrera Aug 14, 2024

Choose a reason for hiding this comment

p-slash Aug 9, 2024

Choose a reason for hiding this comment

p-slash left a comment

Choose a reason for hiding this comment

sbailey commented Aug 13, 2024

HiramHerrera commented Aug 14, 2024

alxogm commented Aug 14, 2024

coveralls commented Jul 26, 2024 •

edited

Loading

andreicuceu commented Aug 5, 2024 •

edited

Loading