Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency of time-delay error estimations on inadequate binning of the true time-delay distribution #17

Open
vbonvin opened this issue Oct 18, 2016 · 1 comment

Comments

@vbonvin
Copy link
Contributor

vbonvin commented Oct 18, 2016

I've noted when playing around with the new covariance matrix function (#16) that the systematic and random errors of the pycs.sim.plot.measvstrue() function might quite strongly depend on the binning chosen for the true time-delay (truetds) distribution of the simulated light curves. It also depends on the plotting range chosen (the r parameter), since currently the extremas of the binning range are the set by (median of truetds) +- r.

I wonder if we should keep that dependency on the plotting range, since it's easy to screw-up by setting a r too small and thus possibly underestimate the uncertainty since not all the simulated light curves are considered. We could for example force that range to corresponds to the size of the truetds distribution.

Another possible source of errors is the number of bins (the nbins parameter). If the number is large, then the bins with smallest (and largest) truetds value that already contain less estimates might get biased, because they do not contain enough estimates to do robust statistic. The control plot (binned tderrs vs truetds) might help us see if this problem arises, but we could e.g. force a bin to have a minimum number of estimates for it to be considered.

@mtewes, what do you think ?

@mtewes mtewes changed the title Dependency of time-delay error estimations on the binning of the true time-delay distribution Dependency of time-delay error estimations on inadequate binning of the true time-delay distribution Oct 18, 2016
@mtewes
Copy link
Member

mtewes commented Oct 18, 2016

I added "inadequate binning" to the issue title :)
It's really a problem more related to the user, not any "flaw" in PyCS.

The idea behind these parameters (r, nbins) is that the user is aware of what she/he is doing. To pick values, one should first see the checkplots on wide ranges, see how the variance and bias depend on the true time delays, and then decide what range and binning make most sense (also depening on how much cpu cost should be invested).
In that sense, if used "correctly", the values of r and nbins "do not matter". If moderately changing these parameter values within plausible ranges modifies the uncertainty estimate signficantly, something is wrong!

This error computation is meant for cases in which the dependence of a measurement uncertainty on the range of considered true delays has been discovered to be small.
If light curves are short and of high quality, and this dependence gets very strong (because precious inflection points are in or out of the overlapping regions), deciding on the range to consider is really deciding about "is this feature real or not", something that PyCS can not do.

Bottomline, I'm a bit in favour of leaving these options "manual", and improving the doc / educating the user, instead of deriving some automatic settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants