-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(storage): add table_id parameter in state store write and sync method for future partial checkpoint support #1812
Conversation
…nc method for future partial checkpoint support
Codecov Report
@@ Coverage Diff @@
## main #1812 +/- ##
==========================================
+ Coverage 70.76% 70.85% +0.08%
==========================================
Files 607 609 +2
Lines 79619 79943 +324
==========================================
+ Hits 56344 56644 +300
- Misses 23275 23299 +24
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
Why don't we directly store the "table id" in the Also, could you please explain the meaning of |
In the best case, we will put this However, we still need much extra work to reach such best case, since all code related to key coding in executor should be changed, and this is not strongly relevant to the topic of this PR. Besides, in this PR,
I've been struggling about the naming for days. In general, such In the first place, I introduce this Do you have any naming suggestion on this concept? |
Why not simply use the key prefix as table id? e.g. the first 4 bytes? All states are now prefixed by either table id or executor id. And this table_id looks somehow similar to a hummock group? |
Okay, I see the comments above.
This is exactly the same as executor id? Do you want to know which actor the data are written from? |
I mean putting the
How about "group" or "checkpoint group"? |
Yes, I'd agree on that. We already have such keyspace to determine the prefix, we can make it also determine the so-called "checkpoint group". It's already making life hard for stateful executors, as they are now required to store an epoch. This PR requires them to store a group id, and there will only be more and more in the future. Executors themselves don't care about such information, we can make it part of keyspace. 🤣 |
@BugenZhao Besides, if we want to put |
I think there's even no need to change the signature of initialization function of Keyspaces. |
For test cases, simply have something like new_without_id. For production code, I think there are no more than 20 occurrences of creating a key space throughout the codebase, which should be easy to work through. |
Also we can have write_batch and write_batch_with_group_id. Even having the group_id in write_batch struct sounds okay to me. |
@skyzh Yes, I agree on avoiding executor to store this As for where to store the table id, in near future when the relation table is implemented, the table itself will definitely store its own id, and then we can pass its own id when it want to write to state store. |
... and if table_id conveys the same meaning as described in PR body (table_id or operator_id), it's simply first 5 bytes of the key. |
It is of course part of the key encoding -- every keyspace begins with either table_root, executor_root or shared_executor_root. And therefore the prefix generated for key space's write batch begins with such prefixes. |
The design of Keyspace is for both reading & writing. We extracted the WriteBatch just to make the caller easier and you can find a reference of Keyspace inside as well. My point is that since it never changes from the view of executor, there's no need to pass it everywhere. We should minimize the impact of codebase as much as possible. |
I mean the My development plan is that at the very first beginning, the table_id to pass to state store write is independent to the relation table id or operator id and all storage tables share the |
I think I do things like the following. The signature for |
I would +1 for that. Also I think there's one problem before we can actually working on partial checkpoint. How do you plan to update table_id (or checkpoint_group_id) across the graph, when there's scale-in or scale-out? |
Also, how can we know a checkpoint group's data are updated by all required executors for a given epoch? |
... furthermore, why do we even need |
Time to go off work 😇 let's come back tomorrow |
My first thought was also to pass Another benefit for passing a finer-grained |
I think scaling in/out is in physical execution layer and this |
I think when we have relational table in the future, there will be a |
Off off. Stop juan. |
+1 for this. :) |
So what about this comment? Currently, all write batch only contain data from one executor state. The first 5 bytes of keys in all write batch will always be the same. (We can add a sanity check for this). Therefore, we can store shared buffer memtables grouped by such prefix, and when sync is called from the meta side, we specify "prefix to sync". |
The checkpoint group -> state id mapping can be stored on the meta side. I'd prefer not to make too much changes and add too many fancy functionalities on compute-node side interfaces. |
What's changed and what's your intention?
This PR is for supporting partial checkpoint proposed in #1157.
In this PR, we add a
table_id
parameter iningest_batch
,sync
andwait_epoch
method of traitStateStore
so that we are able to keep track of where the kvs are written from. Besides, we group the write batches by theirtable_id
inshared_buffer_uploader
and enable syncing a subset of tables in an epoch.Such concept of table is introduced for finer-grained tracking about where the kvs are written from, so that we are able to support partial checkpoint, vertical grouping and shared arrangement in the future.
A storage table id is usually the operator id, or also the table id of the relational table. Currently the
table_id
is not assigned yet, and alltable_id
s areGLOBAL_STORAGE_TABLE_ID
for the time being. In future development we will assigntable_id
according to operator id.Some unit tests and related structs are modified accordingly.
Checklist
Refer to a related PR or issue link (optional)
#1157