-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Interacting with DDlog programs #11
Comments
This is related to vmware/differential-datalog#372 |
Some more thoughts to add to this list:
|
|
DDlog is a streaming database, but by adding indexes and their lookup APIs, and primary keys, we convert it into something that is more similar to a traditional DB. But it is not really a traditional DB, so people who expect it to behave as one may be surprised. |
This won't work as insert->insert->delete behaves like an insert, whereas the "expected" behavior (assuming the user expects set semantics is a no-op)
From experience, many users struggle with this, which is why the current semantics was introduced in the first place. This is the same issue that Frank talks about in the upsert blog.
This preprocessing module will end up maintaining a private snapshot or input state as we do now. upserts avoid this overhead. |
If you have the apply(delta) API then the distinct will work fine (but will not handle the primary keys). |
It is up to the user to rollback a transaction after a failed update. |
If all failed transactions are supposed to be rolled back, why not do it automatically? |
Not sure I understand. How will apply(delta) solve the insert->insert->delete problem. Maybe I don't understand what |
And if they are not supposed to be rolled back, the state of the DB after failed transaction should be clearly defined. |
They are not. It's up to the client. And yes, the state needs to be clearly defined. |
By essentially defining the semantics of an update in this way: take a delta, add it to the input table, and apply a distinct. It is not a traditional DB view, but it is clear. |
I see. This still doesn't solve the insert->insert->delete problem though if each operation happens in a separate transaction. |
We could draw inspiration from Materialize, the way that it handles internal or user-produced errors is by producing parallel error tables (read: relations) for outputs, allowing it to incrementally process errors (and to incrementally fix them as well). The basic structure is that one relation is filled with |
Yeah, status tables are a nice way to report errors incrementally, especially if we want to support a larger class of consistency constraints. The problem with insert->insert->delete though is that we don't want to maintain weights in the normal way. Clients that rely on the |
Reading this paper:
suggests an interesting solution: shared arrangements for all inputs. This could be a compilation option.
In this model a DDlog computation is really a two-stage process: input arrangements followed by the actual dataflow graph. |
That's pretty much what the upsert stuff that Leon's talking about is |
I thought about this more and I hope I have a design. It's not final, but I hope it clarifies some dimensions. I will write a document about it. |
Interacting with DDlog programs
A DDlog program is compiled into a library. The task of sending and receiving data from DDlog programs is left to applications built on top.
Interaction with DDlog programs can be made either through the CLI or through API calls. The CLI operations are in fact interpreted by the CLI tool and converted into API calls. The semantics of these API calls is currently not well specified.
Here are things that should be clearly specified:
start
andend
within a transaction matters. The spec indicates that updates are buffered and applied at commit time. Are API calls such as ddlog_clear_relation also buffered?The text was updated successfully, but these errors were encountered: