Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Substantial Python loop overhead for VBBL #104

Open
kmzzhang opened this issue Oct 2, 2023 · 13 comments
Open

Substantial Python loop overhead for VBBL #104

kmzzhang opened this issue Oct 2, 2023 · 13 comments

Comments

@kmzzhang
Copy link

kmzzhang commented Oct 2, 2023

Currently, the binary lens light curve is calculated in a python loop where each point is evaluated separately.

https://github.com/rpoleski/MulensModel/blob/master/source/MulensModel/magnificationcurve.py#L421

When using VBBL as the finite source method, the python loop overhead can slow things down by up to ~7 times compared to VBBL's own python wrapper, where the loop occurs in C++. This most apparent when VBBL is used for the full light curve (which itself decides automatically whether to trigger full FS calculation). Perhaps could aggregate points that use VBBL and move the loop into C++? I considered making a pull request but I realize this may involve refracting larger parts of the code.

@jenniferyee
Copy link
Collaborator

@kmzzhang We are about to do a major refactor for Version 3, so we can add this to the list. In particular, we are talking about having subclasses for models, which could include creating MagnificationCurve subclasses such as MagnificationCurveVBBL()

@rpoleski
Copy link
Owner

rpoleski commented Oct 3, 2023

We don't have to wait till v3. This seems relatively easy thing. I'm guessing we did it on the epoch-by-epoch basis because it's easier to pass a specific number of floats between python on C++, rather than arrays of unknown sizes.
Thanks @kmzzhang for bringing that up!

@kmzzhang
Copy link
Author

kmzzhang commented Oct 3, 2023

You're welcome. I believe this also applies to 2L1S point source when the SG12 solver in VBBL is used.

Since the finite source method is specified in time intervals in MulensModel, perhaps one way to refractor the code is to aggregate epochs by those intervals and calculate the magnifications?

@rpoleski
Copy link
Owner

rpoleski commented Oct 4, 2023

Yes, that's what I plan to do.

@rpoleski
Copy link
Owner

@kmzzhang Can you please share how you pass pointers/vectors of floats between python and C++? I see that there are different approaches to PyArg_Parse*() functions and would prefer not to re-invent the wheel, if you have already done so.

@kmzzhang
Copy link
Author

@rpoleski Apologies for the late reply. I don't have a particular way of doing this, but perhaps you could use Valario's way of making python wrappers: https://github.com/valboz/VBBinaryLensing/tree/master/VBBinaryLensing/lib. He has a python wrapper for the BinaryLightCurve function that takes array of times (ts). One could easily modify this function (and its wrapper) to take arrays of source locations y1s and y2s instead of times. Everything else would be the same.

@rpoleski rpoleski removed the version3 label Dec 30, 2023
@rpoleski
Copy link
Owner

@kmzzhang Can you provide example codes that show significant differences in execution?

@kmzzhang
Copy link
Author

kmzzhang commented Nov 4, 2024

@rpoleski I did some quick tests
https://gist.github.com/kmzzhang/c174cb3586ebab69dde5388efd2868c4

If MM only uses VBBL for points that need full finite source, it is around 0%--40% slower.
If MM uses VBBL for full light curve, up to an order of magnitude slower depending on how many points need full FS.

Note that in the "Point source" example, VBBL native python wrapper is faster than MM point source method, even when VBBL automatically decides whether to do FS or not for each point (I kept non zero rho but none of the points actually needed FS).

So once the loop is moved within C++ one doesn't need to specify the time interval needing full FS and it's still lot faster.

@kmzzhang
Copy link
Author

kmzzhang commented Nov 5, 2024

Also if I recall correctly, for each subsequent time sampling VBBL initializes lens-equation root finding from the roots for the previous time sampling. But since MulensModel calls the VBBL python wrapper separately for each time stamp VBBL had to initialize the roots from zero every single time making it much slower. So it's most likely more than an overhead.

@rpoleski
Copy link
Owner

rpoleski commented Nov 8, 2024

There is one more aspect: MM still uses some older version VBBL, not VBM. I don't know how much they differ in speed.

@CoastEgo
Copy link

CoastEgo commented Dec 6, 2024

I guess one reason is that MulensModel doesn't have the keyword for relative tolerance control (right?), which significantly affects the speed.

@rpoleski
Copy link
Owner

rpoleski commented Dec 9, 2024

I've started a new branch for that: vbbl_reltol

@rpoleski
Copy link
Owner

Setting relative tolerance is ready to be merged. The only question is what default values we set for absolute and relative tolerance. I'd like the code to work well for Roman data, so I propose RelTol = 0.001 and Tol = 0 (i.e., the latter is ignored).
@jenniferyee what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants