-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement group by #198
base: master
Are you sure you want to change the base?
Implement group by #198
Conversation
if (uniqueByValue.size == seq + limit) return false | ||
}, | ||
(early) => { | ||
if (early) return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why we need this with push.drain, without this, we'll get called twice when bailing out early.
Benchmark results
|
I don't quite understand the need for this but trust you. to be clear, the way I get the 5 most recently updated eg podcasts is pull for type podcast (getting updates and roots). then run this through crut.read and throw out the duds. once I have 5 good ones kill the stream. oh, and we have a pull.unique in there too. |
It is the same algorithm, now that you explain it. So right now this is probably only marginally faster. It does maybe improve the discoverability of these kind of things. I do have an idea to use an index for this, then it should be faster. So challegence accepted :-) Changing this to draft. |
Background
@mixmix was working on adding an option to crut list where you would get results ordered by updateTime. We would like to avoid running through all the results (db1 has to do this) in order to display say the latest 5 buttcasts. For this what we really need is a group by operator similar to what you have in a normal database. So this is quite general, you could also use this functionality to show the latest post for each feed. I tried modelling this similar to the seek function. This means we still have to run through results to extract what we need, but since we are working directly on the bipf data, it's not really that bad. This can still be combined with regular where filtering and when used with limit, it will stop once enough results are found.