constructing validation test for DC2 dN/dmag #21

rmandelb · 2017-10-25T01:42:50Z

Question for @janewman-pitt-edu @damonge and anybody else who is interested:

There is an interest in making sure that DC2 has a galaxy population that is complete beyond the flux limit of LSST, to ensure that realistic levels of blending are included. We would like to do this validation test on proto-DC2, comparing dN/dmag against some realistic reference using DESCQA. The question is what is the reference sample we should use?

Currently if we were to compare the DC2 population with, e.g., the COSMOS catalog, we can only test to i~25.2. The purpose of this issue is to solicit suggestions for catalogs we could use to make a deeper validation test. I could imagine the extreme case would be Hubble UDF, but that is so small that it should have significant cosmic variance. There are some intermediate-depth and area HST surveys (deeper than COSMOS but smaller area, while shallower and larger area than UDF). We could explore those?

The CatSim page has a comparison against data going to 27th magnitude: https://www.lsst.org/scientists/simulations/catsim but the plot caption does not say where the validation data comes from.

egawiser · 2017-10-25T01:49:10Z

CANDELS offers public catalogs with significantly increased area vs. HUDF. I haven't plotted HUDF number counts, but since we're effectively integrating along the entire line-of-sight and (I think) just looking for a bias in DC2 catalog number counts versus data in broad bins of object brightness, we should be relatively robust to cosmic variance.

janewman-pitt-edu · 2017-10-25T02:25:44Z

CANDELS volume is still tiny, and sample/cosmic variance is substantial. It's enough to affect number counts (and even more so redshift distributions) probably at the ~10% level. Maybe we're fine with that for this test. How deep does HSC go?

janewman-pitt-edu · 2017-10-25T02:28:12Z

Actually, I think you're probably better off validating against the existing much larger-area but a bit shallower surveys (e.g. HSC + DES deep fields) rather than the tiny area HST data. You'd just truncate the comparison where the existing data is fully complete (e.g. around the 10 sigma limit of the photometry). It's probably OK to assume that the number counts in actuality continue to rise geometrically a mag or so fainter than they are observed to do so in those datasets.

egawiser · 2017-10-25T10:25:51Z

We should do whatever is easiest that offers the needed precision. I will guess however that if we were to require our simulated number counts beyond LSST depth (e.g., r~28) to match reality at anything close to the 10% level, it might take a decade for the DC2 catalogs to pass QA. The goal is "realistic levels of blending", so I would interpret that as wanting to be within a factor of 2 at such number counts and hoping to be off by only 50% (in either direction). Can anyone quantify the requirement more precisely?

rmandelb · 2017-10-25T21:24:09Z

Well, for starters, the total blend rate should scale like the square of the overall number density, shouldn't it? (at least in the limit that clustering is not a big contributor to the blend rate, which is true for LSST according to @dkirkby ) So if the overall number density is wrong by say 10%, the blend rate would be wrong by 20%.

But that's not precisely what we care about, in the sense that we care about the rate of blends as a function of flux ratio, so you could imagine thinking of the blend rate in a 2d plane defined by brighter vs. fainter galaxy mag. In a given bin in that plane, the blend rate depends linearly on the number densities on each axis. It seems like this becomes a requirement on the number density as a function of mag, which was why I was proposing looking at dN/dmag in the first place. But it's complicated because for example we might care more about similar flux ratio blends within the limit of the gold sample, and so getting dN/dmag right for something 1.5 mag fainter than that is less important (that becomes noise on the outskirts of the gold sample galaxies, which we do want to include in the sims, but it probably has a smaller impact than blends with something at the same mag). I would imagine we want dN/dmag to 10% down to r~26, but beyond that our requirements are quite a bit less stringent. 30%? I'm guessing at this point.

I believe it may have been one of @mdschneider @joezuntz @rmjarvis @esheldon who articulated the need to have a realistic blending rate in the DC2 sims for WL, so perhaps one of them can weigh in either on how realistic is good enough, or on a good validation dataset. (Michael, Joe, Mike, Erin - please read the first post in this thread for an explanation of what this is about)

rmjarvis · 2017-10-25T21:56:47Z

I don't know that we need to match it perfectly. The goal of DC2 is to test the ability of our code to handle the kinds of blends that are in real data. Not to precisely match the level of blending that we will see in some year of LSST data. In other words, it is a validation sim, not a calibration sim.

So I'd say a factor of two or better is good enough for our purposes. If it were really 10% squared = 20%, I'd be ecstatic.

wadawson · 2017-10-25T23:04:45Z

We showed in Dawson et al. (2016) that the number density of ambiguously blended galaxies grow proportional the number density of galaxies to the ~3rd power (not squared; see figure 1). In section 5.1 we note why this is greater than to the power of 2:

"Thus, when n is far from the confusion limit, the index α from Equation (19) will tend to be >2. It can be greater than 2 in part due to clustering, but to a larger degree it is because dP is a detect blend function of the deblending threshold that is usually defined with respect to the pixel level background. The number of low S/N galaxies increases as the limiting magnitude increases because the number density is observed to scale as n ~ 100.31(m-20) (Tyson 1988; Beckwith et al. 2006), where m is the limiting apparent z850 magnitude of the survey. Because it is more difficult to detect blends of low S/N objects, we should expect larger blend fractions with increasing survey limiting magnitude. Therefore α should be an increasing function of survey limiting magnitude until it levels off near the confusion limit."

Historically, in earlier versions of the paper and presentations to DESC we noted the impact of ambiguous blending on LSS number counts as an important systematic.

rmandelb · 2017-10-25T23:52:47Z

@wadawson - yep, I was using an approximation that may not be correct for LSST, probably for multiple reasons.

Historically, in earlier versions of the paper and presentations to DESC we noted the impact of
ambiguous blending on LSS number counts as an important systematic.

Absolutely - that isn't really a matter of question here; I think that message has been pretty well internalized by the group. The only issue is how accurately do we need to represent the blend rate for DC2 given what the goal of DC2 is (which is not the same thing as asking how well we want to be able to represent it e.g. for dedicated blending studies).

janewman-pitt-edu · 2017-10-26T01:38:25Z

I think people are overcomplicating things here... in all the datasets I've seen, number counts rise geometrically with only a very very slow change in power law slope as a function of magnitude, until incompleteness sets in (see, e.g., Fig. 15 of https://academic.oup.com/mnras/article/423/3/1992/2459951/Galaxy-properties-from-the-ultraviolet-to-the-far ). This is true to at least i~29 in HST ACS data.

This means that one can analytically extend the magnitude distribution from a shallower (but wider, so low sample/cosmic variance) survey by a magnitude or two with much less than 10% uncertainty (I would suggest measuring the logarithmic derivative of dN/dm and its second derivative as well to see if the latter is significant).

rmandelb · 2017-10-26T13:58:31Z

@janewman-pitt-edu - I agree with you about analytically extending something from a shallower survey, but I think you may have misunderstood the 10% (or whatever %) discussion. That wasn't a discussion of the error we'd allow on the validation data, which I agree we can easily know well. It was a discussion of the error we'd allow in DC2, i.e., if the number counts and therefore blend rates don't quite match those expected in reality, how far off are we OK with them being.

janewman-pitt-edu · 2017-10-26T15:22:15Z

Ah, thanks for clearing that up! (Oh, and I definitely agree with @dkirkby that the majority of blends at LSST depths are from very different redshifts, though I'm not certain what the fraction is exactly).

rmandelb · 2017-11-01T23:31:09Z

Update on this issue. As part of our SRD work, I used the publicly-released HSC deep survey layer to measure the dN/dmag (including absolute normalization and slope) in i-band. I also did a power-law fit to dlogN/dmag and the cumulative fits as a function of limiting mag, so we can extrapolate that, given that these power laws are generally pretty accurate across a wide range of magnitudes as Jeff said.

What I got from HSC Deep was a best-fitting number density (in units of arcmin^-2) of
N(<ilim) = 43 * 10^(0.36 (ilim - 25))
where ilim is the i-band magnitude limit of the sample. So, to be clear, these are cumulative number densities, not differential. If you want differential counts, it's just a multiplicative factor to convert.

Just to throw out numbers, it would be nice to agree with this at the 20% level at ilim=25.3 and 40% at ilim=27.

Would this work for a validation test?

yymao · 2017-11-02T01:18:21Z

@rmandelb I think that's great validation test that we can do. We can keep track of the status of these tests in #30

rmandelb · 2017-11-02T02:11:15Z

OK, so in the interest of not having too many open issues that duplicate each other, do you want to close this issue (now that we've decided what the test should be) and, in #30, link to the final comment with the power-law fit on this page?

yymao · 2017-11-02T02:13:20Z

@rmandelb sure. I don't have the permission to close issues in this repo though.

katrinheitmann · 2017-11-02T02:14:57Z

I would actually leave it open. The #30 is the epic about validation, mostly introduced so that Anders can track it. I think having the issue 21 as linked to the overall validation issue is fine.

…

On 11/1/17 9:11 PM, Rachel Mandelbaum wrote: OK, so in the interest of not having too many open issues that duplicate each other, do you want to close this issue (now that we've decided what the test should be) and, in #30 <#30>, link to the final comment with the power-law fit on this page? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#21 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMQ9jKRHxOz2phzbD72TaoAoEMCQjNsOks5sySTFgaJpZM4QFUg_>.

evevkovacs · 2017-11-02T04:14:48Z

Yes, it would make a good validation test. It is on my list to implement it. Thanks for doing the fits.

…

On Wed, 1 Nov 2017, Rachel Mandelbaum wrote: Date: Wed, 1 Nov 2017 18:31:11 From: Rachel Mandelbaum ***@***.***> Reply-To: LSSTDESC/DC2_Repo <reply+0076fc4d7ff441653d9fc18b36afb56509405512976d9bc692cf0000000116121b3 ***@***.***> To: LSSTDESC/DC2_Repo ***@***.***> Cc: Subscribed ***@***.***> Subject: Re: [LSSTDESC/DC2_Repo] constructing validation test for DC2 dN/dmag (#21) Update on this issue. As part of our SRD work, I used the publicly-released HSC deep survey layer to measure the dN/dmag (including absolute normalization and slope) in i-band. I also did a power-law fit to dlogN/dmag and the cumulative fits as a function of limiting mag, so we can extrapolate that, given that these power laws are generally pretty accurate across a wide range of magnitudes as Jeff said. What I got from HSC Deep was a best-fitting number density (in units of arcmin^-2) of N(<ilim) = 43 * 10^(0.36 (ilim - 25)) where ilim is the i-band magnitude limit of the sample. So, to be clear, these are cumulative number densities, not differential. If you want differential counts, it's just a multiplicative factor to convert. Just to throw out numbers, it would be nice to agree with this at the 20% level at ilim=25.3 and 40% at ilim=27. Would this work for a validation test? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.[AHb8TeYApJgPksw-Qh0XO-kXBHLuzOK2ks5syP8_gaJpZM4QFUg_.gif]

duncandc · 2017-12-04T19:33:29Z

@rmandelb would you mind the sharing the code you used to calculate N(<i_lim) with the HSC data or point me to a useable catalog where I can calculate myself?

rmandelb · 2017-12-04T19:42:15Z

The code I used to calculate it is in another DESC repo, so I'm happy to share, but here's a question: do you want the actual N(<i_lim)? Or a power-law fit to it that describes the data well? I would think the latter might be better because then you don't have to worry about what is the histogram binning, etc.?

For reference, the fit is N(<i_lim) = 42.9 [10^(0.359 (i_lim-25))] per arcmin^2, shown as the black dashed curve on this plot (compared with real data in the blue solid curve):

The deficit at bright magnitudes is because of some selection criteria that downselect bright objects (i=22 is very bright for HSC) and at faint magnitudes, due to incompleteness. Obviously, beyond i~25.5 there is some major extrapolation happening.

duncandc · 2017-12-04T19:51:32Z

What I would like is some self-contained code that does the following:
1.) downloads necessary HSC data
2.) calculates the cumulative number density of galaxies in a set of bands
3.) interpolates the above calculation to get the cumulative number density on regularly sampled intervals
4.) fits a power law over some range
5.) defines an upper mag limit where the power law fit is used to extrapolate
5.) prints out a simple lookup ascii table with mag and N(<mag) for a set of bands that I can use for the test.

In principle, the above is something that someone can provide me, but I would feel better if I could follow it through the steps, make sure it is reproducible, and use as similar as possible code to create more lookup tables from different surveys.

rmandelb · 2017-12-04T19:57:33Z

OK, didn't realize you wanted to download the data. Just a few comments:

Your item (1) is going to depend a lot on the survey (what cuts/flags to use, best photometry to use, etc.)
In practice, I recommend going from the differential rather than cumulative counts. The reason for this is that at the bright end, you will often miss some objects due to cuts on e.g. interpolated pixels or CCD edge cuts or whatever (big bright objects cover more pixels), and if you use the differential counts then you can pick the mag range that is less affected. Then you can fit a power law, and then convert the power-law to cumulative counts. Do you see what I mean?

I'm going to separately e-mail you an SQL script for downloading the HSC data, and a python script that does the process I described above for HSC Deep i-band data. The latter is already in the Requirements repo which I believe you don't have access to. I'm happy for comments on any of it, or for the code to be put into this repo as needed.

duncandc · 2017-12-04T20:01:35Z

My first pass implementation idea is to simply use the cumulative number density at a single magnitude limit as the 'test' and not the full apparent magnitude function in bins. I think this gets at the basic quantity of interest. it is a simple extension to test for a few different magnitude limits if the shape of the app_mag_func is important.

evevkovacs · 2017-12-04T20:05:08Z

@duncandc How about using the fit for a start? Ideally, there would be papers which do the fitting and to which you could refer, but I understand that these may not exist yet for LSST-like depths. But initially, if we just want to get an idea of how the catalogs look, using Rachel's fit would be quickest. It is not obvious to me that the WGs will need fancier cuts or options, so it might be a good idea to get feedback from them before putting a lot of work into a more general structure.

duncandc · 2017-12-04T20:07:16Z

That's the path I am taking--The test itself is exceedingly simple. I am also interested in the data.

rmandelb · 2017-12-04T20:08:46Z

We definitely care about getting the shape of the number counts roughly right.

I just sent you some stuff by e-mail so you can download the data with the same SQL query and process it with my script (or modify the script as needed). I should emphasize that I only did the i-band cumulative counts, so if we want to test >1 band, we need to at least rerun the script for r-band (which I can do myself if you want, because I did download the r-band magnitudes) and possibly others (which would involve re-downloading the data with more information).

yymao · 2017-12-04T20:16:46Z

@duncandc are you thinking to generate the validation data on the fly every time when the test is run? Need to consider computational resources.

duncandc · 2017-12-04T20:18:00Z

I am not considering that. I am putting on the creating validation data hat.

janewman-pitt-edu · 2017-12-09T23:15:30Z

I agree we want to get the shape of the number counts right. However, we also want to be sure to do that test only where counts are highly complete (i.e. at least 0.5-1 mag brighter than where counts turnover, and probably avoiding the bright end to minimize impact from stars).

Best,

Jeff

evevkovacs · 2017-12-10T16:49:16Z

@duncandc @janewman-pitt-edu @rmandelb @yymao Duncan has worked on a test and example plots can be seen here There are some normalization issues, but even so, we can now start to think about specifying validation criteria, such as defining a validation region to be used, and the required level of agreement with the data.

rmandelb · 2017-12-12T15:43:30Z

@evevkovacs @duncandc - thanks for sharing these plots. Apologies if I am missing where to find this information, but what is each curve? (i.e., which is validation dataset vs. the simulation we are validating, and why does one plot have 3 curves but the other plots have 2 curves?)

Is there anything I can help with to try to track down normalization issues? For example, Duncan, if this comes from your own recalculation based on downloading HSC data, do you want to compare with the normalizations that I have calculated to see if we agree about the data?

Just to throw out an idea for validation criteria: I think we'd like to sample a few i-band magnitude values within the range where HSC is clearly complete. Let's say i=24, 24.5, 25. We'd like N(<i) in the data and simulations to agree to within ~20% at those values of i-band magnitude. It would also be nice to take the power-law extrapolation of the HSC density for 22<i<25, and require N(<i=27) in the sims and extrapolated power-law from data to agree to within ~40%.

yymao · 2017-12-12T15:49:39Z

@rmandelb the one with 2 curves + data points is the summary plots (i.e., overplotting all catalogs), and other plots are just for individual catalogs (i.e., no overplotting other catalogs) --- the contents are the same though.

Maybe you can help reviewing this notebook @duncandc put together for getting the HSC data? You can also find the corresponding PR at LSSTDESC/descqa#46

duncandc · 2017-12-12T16:02:09Z

As for the normalization, I will check it against @rmandelb 's previous calculation.

Is there a better way to estimate "error bars" on N(<m) than back-of-the-envelope estimates? Asking for O(10%) agreement seems arbitrary to me.

rmandelb · 2017-12-12T16:03:25Z

I'm happy to review the notebook and PR. Some cosmetic updates that would be helpful for people who look at this test:

the plot should say which is validation data and which is the sim
the summary plot should say which sim is which
should have an axis label for the horizontal axis that says what band is being shown
it would be helpful to display number densities in number per square arcmin, rather than number per square degree (I know it's a minor point, but many observers have a rough rule of thumb for expected density per square arcmin in imaging data, so looking at the plot in those units without having to divide by 3600 makes it easier to do sanity checks).
having the vertical axis explicitly say this is a number density rather than a raw number -- e.g., by noting the units -- would be helpful.

I will try to review the notebook and PR in the next day or so.

yymao · 2017-12-12T16:06:34Z

@rmandelb thanks for the suggestions. This validation test, as you can tell, is a sprint product and is still under review on a different PR LSSTDESC/descqa#47, so your suggestions are very useful and I believe @duncandc will implement them :)

rmandelb · 2017-12-12T16:08:46Z

@duncandc - thanks for checking against my previous calculation, that will be helpful.

Is there a better way to estimate "error bars" on N(<m) than back-of-the-envelope estimates?

The statistical uncertainty on the N(<m) due to number counts alone is very small given the area of the HSC survey and the sims, while systematic uncertainty due to e.g. whether we've defined the samples in totally consistent ways in data and sims is larger and hard to quantify, but roughly speaking can give order 10% uncertainties (this is pretty much an empirical statement based on comparing sims and data many times and seeing how big changes can arise due to that kind of thing). Maybe somebody else will have a better idea than I do...

evevkovacs · 2017-12-12T18:41:03Z

@duncandc @rmandelb @yymao So to summarize a first pass attempt at validation criteria, we have:
~~1) Add a 10% systematic uncertainty to the HSC data points. (I would guess these are correlated, but we can ignore this for a first pass)~~

2) Check that the catalog data agrees within 20% ~~( so that's 20% + 10% added in quadrature)~~ in the validation region 24 < i < 25
3) Check that the catalog data agrees within 40% ~~(ie 40% + 10% in quadrature)~~ with the extrapolation to the power law fit to the HSC density in the validation region i = 27

Items 1) and 2) ~~2) and 3)~~ are normalization checks. Initially just check if the catalog falls in the required range; we can come up with something fancier if need be.

Plot enhancements in addition to those mentioned by Rachel are:

Show the HSC power-law fit on the plot (and add to legend)
Show the validation regions as shaded vertical bands. Actually, I think we opted to shade the non-validation region for DESCQA-v1, but we'll have to see how it looks.

Does anyone have any other suggestions or improvements?

rmandelb · 2017-12-12T19:09:15Z

Not exactly. I wasn't proposing to do (1) in addition to the others. Rather, I was saying that (2) and (3) are meant to incorporate that 10% uncertainty.

evevkovacs · 2017-12-12T19:28:08Z

@rmandelb Thanks very much. I adjusted the comment above and struck through the incorrect steps.

rmandelb · 2017-12-14T15:36:30Z

@yymao @duncandc - I wanted to check something: is it OK if I wait to review the PR until after the changes I already suggested get implemented? I would prefer to just review it once, if that works for you. Ping me once the changes are made and it's ready for review?

duncandc · 2017-12-14T15:38:18Z

@rmandelb, sure thing. I will ping you when I have made those changes.

yymao · 2017-12-14T15:40:59Z

@rmandelb yes --- and JFYI, we (the DESCQA team) encourage early PR so people can see the progress. Thus, it is totally OK to defer review until the PR submitter says it's ready for review! In the meantime, leave comments and suggestions in the PR (i.e., not here) would be very helpful :)

katrinheitmann · 2017-12-14T23:23:01Z

Closing this issue now since it is taken care of in descqa repo: LSSTDESC/descqa#7

rmandelb added the CS label Oct 25, 2017

egawiser closed this as completed Oct 25, 2017

egawiser reopened this Oct 25, 2017

rmandelb added the validation label Oct 25, 2017

This was referenced Nov 3, 2017

Validation tests for protoDC2 #30

Closed

dN/dmag test LSSTDESC/descqa#7

Closed

katrinheitmann added this to the DC2 CS Freeze milestone Nov 8, 2017

yymao mentioned this issue Dec 12, 2017

apparent magnitude function test LSSTDESC/descqa#47

Merged

katrinheitmann closed this as completed Dec 14, 2017

constructing validation test for DC2 dN/dmag #21

constructing validation test for DC2 dN/dmag #21

Comments

rmandelb commented Oct 25, 2017

egawiser commented Oct 25, 2017

janewman-pitt-edu commented Oct 25, 2017

janewman-pitt-edu commented Oct 25, 2017

egawiser commented Oct 25, 2017

rmandelb commented Oct 25, 2017 • edited Loading

rmjarvis commented Oct 25, 2017

wadawson commented Oct 25, 2017

rmandelb commented Oct 25, 2017

janewman-pitt-edu commented Oct 26, 2017 • edited Loading

rmandelb commented Oct 26, 2017

janewman-pitt-edu commented Oct 26, 2017

rmandelb commented Nov 1, 2017

yymao commented Nov 2, 2017

rmandelb commented Nov 2, 2017

yymao commented Nov 2, 2017

katrinheitmann commented Nov 2, 2017 via email

evevkovacs commented Nov 2, 2017 via email

duncandc commented Dec 4, 2017

rmandelb commented Dec 4, 2017 • edited Loading

duncandc commented Dec 4, 2017 • edited Loading

rmandelb commented Dec 4, 2017

duncandc commented Dec 4, 2017 • edited Loading

evevkovacs commented Dec 4, 2017

duncandc commented Dec 4, 2017

rmandelb commented Dec 4, 2017

yymao commented Dec 4, 2017

duncandc commented Dec 4, 2017

janewman-pitt-edu commented Dec 9, 2017

evevkovacs commented Dec 10, 2017

rmandelb commented Dec 12, 2017

yymao commented Dec 12, 2017

duncandc commented Dec 12, 2017

rmandelb commented Dec 12, 2017

yymao commented Dec 12, 2017 • edited Loading

rmandelb commented Dec 12, 2017

evevkovacs commented Dec 12, 2017 • edited Loading

rmandelb commented Dec 12, 2017

evevkovacs commented Dec 12, 2017

rmandelb commented Dec 14, 2017

duncandc commented Dec 14, 2017

yymao commented Dec 14, 2017 • edited Loading

katrinheitmann commented Dec 14, 2017

rmandelb commented Oct 25, 2017 •

edited

Loading

janewman-pitt-edu commented Oct 26, 2017 •

edited

Loading

rmandelb commented Dec 4, 2017 •

edited

Loading

duncandc commented Dec 4, 2017 •

edited

Loading

duncandc commented Dec 4, 2017 •

edited

Loading

yymao commented Dec 12, 2017 •

edited

Loading

evevkovacs commented Dec 12, 2017 •

edited

Loading

yymao commented Dec 14, 2017 •

edited

Loading