-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add simple metalog implementation #1444
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
This implementation works like a charm, except that it is extremely slow! Performance has been coming up a lot lately. I'll see what I can do on that end. |
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## develop #1444 +/- ##
===========================================
- Coverage 71.58% 71.27% -0.31%
===========================================
Files 92 92
Lines 4656 4763 +107
Branches 853 883 +30
===========================================
+ Hits 3333 3395 +62
- Misses 1318 1363 +45
Partials 5 5 see 13 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
I could probably add fitting functionality to this as well. But this as it stands is a simple implementation. |
@NunoSempere can you review this, against other Metalog applications? I remember Eli/Misha had a different one they thought was good. We should make sure that it seems correct. |
Ok, here are my notes with this implementation:
|
@Hazelfire, can you: A. Add a README.md to https://github.com/Hazelfire/metalog/ with some documentation. In particular, I am curious about
B. Give me your thoughts on speed/accuracy tradeoff. In particular, why wouldn't it be acceptable to expose the iterations/accuracy threshold/etc as a parameter, which would allow for faster initial iteration, but also for slower computing when finalizing. Here: https://github.com/Hazelfire/metalog/blob/main/index.ts#L60 I'm seeing that you just bake the numbers in. C. Give me your thoughts about why we get a non-monotonic cdf, and consequences So an non-monotonic cdf is really bad, because it implies negative probability density for some regions. At first I thought: That's fine, you can take a non-monotonic cdf and convert it to an almost-monotonic cdf by
But then we no longer get regions in which the probability density is negative. But we can get stil regions in which the probability density is 0. But this is a bit weird, since I don't see how you could quite get this when just specifying different points in a cdf. Is this something we expect to get only when working with little accuracy? Or is this something that you expect would remain? D. As a minor nitpick, I would have named things such that their apparent meaning is a bit more clear. E.g., have arrays be |
Will do, though I've first left some comments above. |
Hey Nuno! Added a really basic README for you.
The array
It does! You can calculate the
As far as I can tell rn, there's no tradeoff, my implementation is both much more accurate and much faster. I can expose the accuracy parameters if you want, but I don't think we would have any need for this to be much faster or much more accurate. I got the speed by making the method much faster. There is a little bit of detail on that in the new README.
It's annoying, but the reason we get this is because we can only approximate the CDF of a metalog distribution by newton's method. Because of this, the approximations are going to be off slightly, in a way that would be very slightly inconsistent. I think this becomes a problem with Squiggle because it makes the x coordinates out of order (I think it assumes the function is monotonic to order the coordinates. I can't remember exactly how it works)
The pdf function is not calculated from just subtracting two cdfs, so the pdf, at least as implemented in the package, has no negative probability density anywhere (This hasn't been tested, but it should be correct)
I expect it to remain unless I work out a way to do some crazy floating point manipulation code.
I use |
Interestingly, the pdf is not always strictly positive. I don't like that. Let me look into that. |
I implemented fitting to cdf in @quri/metalog! |
Putting some ideas here for metalog fitting syntax: One possible consistent (but feels kind of off) is: metalog({p5: -1, p20: 4, p90: 6}, 2) Where the last number is the terms. I'm, however, leaning more towards: metalog([{x: -1, q: 0.05}, {x: 4, q: 0.2}, {x: 6, q: 0.9}], 2) Which can be more easily programmatically constructed |
Currently, I've opted towards this syntax:
I'm not trying to make the implementation more robust and give better error messages (particularly fixing Xs is not sorted, which shows up a lot) |
Hey @Hazelfire, coming back the previous points I raised:
Also, could you update https://github.com/Hazelfire/metalog/ so that it is up to date, if it isn't already?
Thanks for doing this, really appreciate it. |
Any updates here? |
Changing to draft - my impression is that there are remaining issues. (If you think it's done, Sam, please respond to Nuno's comments, and also say so) |
Hey! I've been busy doing other things. Just to chat: Hey @Hazelfire, coming back the previous points I raised:
Only thing that's missing here is documentation... I think? I can add documentation if you'd like. Thanks for doing this, really appreciate it. |
I'm still getting a lot of trouble getting this to work on a bunch of test cases.
For example. Would be curious to get a better understanding here - do we expect this to fail in ~30% of cases, or any idea? Is this something that's fixable with a better underlying algorithm? |
Adds a simple metalog implementation. Doesn't include any fitting yet. Based on https://www.npmjs.com/package/@quri/metalog.
Todo: