-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request #44
Comments
Hi, It's an interesting idea. However adding a new update mode seems too much to me. Status ProcessRecordHistory(std:string_view key, RecordProcessor* proc); An example usage would be like this: class HistoryPrinter : public DBM::RecordProcessor { HistoryPrinter printer; Methods of the given processor are called in the reverse chronological order of update operations. I'm not sure how many people use this kind of niche features though. |
Hi, Implementation A is far from ideal as it requires RMW, with growing number of items and constantly increasing value block size to accommodate extra item/event on each RMW/append (I think value block size would eventually saturate and stay at longest produced event list size). It would be ok-ish if db was small and didn't require Direct I/O (so it would be cached in mem and occasionally sync()-ed). but unfortunately in our case TreeDBM db size can exceed avail mem. B is better as events are just inserted to TreeDBM, but as with any trees, log(n), they need to be rebalanced, multiple pages updated to insert / remove a key, and there is this additional code we have to maintain to iterate, combine, copy to another table and then remove from TreeDBM C is close to what we need performance wise but again it requires this additional code to assemble events from open group, copy into another table, plus handling some interesting side effects What I'm proposing is new HashDBM merge operation that would append without RMW, and employ what update_mode=append mode does internally, puts new value on top of the linked list. Get (key) would return all appended values as a list. A bonus feature would be a full merge to materialize the linked list into single block of data. I will try to get closer look at tkrzw code to see if there is a way to implement what we're looking for (i.e. with RecordProcessor) or extending tkrzw code would be a better solution. I appreciate your feedback |
Hi, Let me clarify my idea in my first reply. HashDBM dbm; Then, if a method like "ProcessRecordHistory" accepts arbitrary callback functions, you can get all past events of the same key in the reverse order. The following code would print "foo was set to event02" and "foo was set to event01". class HistoryPrinter : public DBM::RecordProcessor { HistoryPrinter printer; I guess this could satisfy your requirement. |
Hi Mikio Apologies for going quiet. What we're planning to do is to modify Hash DBM code to implement efficient append, in similar fashion it is implemented in TinyDBM: instead of GET/SET, append at the value offset if appended value can fit in current record space or create new resized record and copy+append. If I haven't said it already, I'm really impressed by how Tkrzw library is designed and implemented Regards, Milosz |
Did you read my previous response?
I guess that retrieving past values is the most efficient way to realize
appendable list on HashDBM's appending mode.
dbm.Set("foo", "abc");
dbm.Set("foo", "def");
auto& trace = dbm.Trace("foo");
string value;
while (trace.Get(&value)) {
cout << value << endl;
}
If this code prints "abc" and "def", you can regard the record as a list,
can't you?
…On Tue, Mar 12, 2024 at 10:02 PM miloszskalecki ***@***.***> wrote:
Hi Mikio
Apologies for going quiet. What we're planning to do is to modify Hash DBM
code to implement efficient append, in similar fashion it is implemented in
TinyDBM: instead of GET/SET, append at the value offset if appended value
can fit in current record space or create new resized record and
copy+append.
I also created C#/.NET Tkrzw bindings and would be happy to share if you
are interested.
P.S. If I haven't said it already, I'm really impressed by how Tkrzw
library is designed and implemented!
Regards,
Milosz
—
Reply to this email directly, view it on GitHub
<#44 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQGJVRCIOCFZ7Y426NV7DC3YX34FRAVCNFSM6AAAAAA5X6UCMGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJRGYYDEMJWHE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hi Mikio, Yes, I did. I also created a small prototype to test it and there are two side effects to this approach:
Unfortunately, frequent rebuilds are not an option for us. To recap. Essentially. this is what we do (I'm going to use non-financial example, in our case measurements are price candles): Requirements:
Capacity stats based on above:
I created a prototype, the hash database params (from top of my head) = num_buckets=30M,align_pow=15,cache_buckets=1,fbp_count=100 It works as expected, once the history saturates (each sensor reports measurements for at least 24h - obviously prototype can simulates it by generating dummy data) free block pool works as expected and the database behaves as ring buffer and does not grow. The only spanner in the works is that in-place appends for each measurement result in NxN efficiency. Let us investigate that possibility and come back to you once we have something concrete. Kind Regards |
Hi Mikio,
I really, really like your database and would like to ask if there is a remote chance to support a scenario where append operation on HashDBM instead of doing Read-Modify-Write would just do what update_mode=append does internally, which is insert a new linked list entry and update HashRecord's Head, and Rebuild would not erase/reclaim link list items. HasDBM.Get operation()s in new "update_mode=merge" mode would return collection of all items from the linked list for a given key. This would be equivalent to RocksDB merge operation but on a hash database. Currently we have to use combination of two databases TreeDBM and HashDBM to implement this scenario, but using TreeDBM has its drawbacks (log(n), need to use range iterations, tree rebalancing etc.) and we are keen to be close to O(1) (well in proposed mode this would be O(1) + enumeration of the link list). I'm happy to provide more details and looking forward to hear your opinion.
Best Regards
The text was updated successfully, but these errors were encountered: