-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: use Rocksdb as a StateStore
#5684
Comments
If we are looking for low-effort approach to try RocksDB to see whether it can help for streaming executor optimization, how about porting the deleted rocksdb state store codes in an independent branch and trying it out first? Note that we can mimic both one RocksDB instance per CN and one RocksDB instance per executor. |
SummaryMy suggested path forward:
Thus, my main tasks now is to update the previous implementation to have the current interface, merge behind a feature flag to main, and talk to CI team to ask how to schedule a periodic CI (it should not interfere with release build actually and should only alert us to make some updates to the code behind the feature flag if necessary). It would also be nice if we can trigger this CI to run for a particular PR. Thereafter, we can perform some benchmarks using EBS storage as part of our performance and benchmarking efforts, and see where we go from there. Updated tasks:
Details
Is this a good idea? @hzxa21. Seems overkill? Well, we can test it anyways. I guess since the state does not overlap between executors. If we use a shared instance instead, we can even use Column Families, which may provide better performance (similar to our compaction groups). Anw, we do not need to overly optimize for now. Anyway, at least the old implementation uses a single instance for the entire compute node. It seems Flink does too (e.g. see: https://towardsdatascience.com/heres-how-flink-stores-your-state-7b37fbb60e1a) Scaling/RecoveryAnyway, I guess there is no path to scaling or recovery for this rocksDB local store at the current moment. So it will behave similarly to in-memory store except that we persist to disk. In the case of Flink, they actually periodically persist snapshots (incremental or full) to remote storage. So I'm not sure we will do the same but I guess as a first iteration we will ignore this. (See: RocksDB backend in Flink) Our objective is just to bench the performance at a steady state and assuming no faults... However, I guess not creating and uploading snapshots will be a key difference to Flink's implementation and will affect performance. But we will ignore this for now. Build
As for the approach, I would rather maintain it as a feature on main, we may or may not run a basic CI on it (so it doesn't get forgotten). I don't think the maintenance burden will be too high. And anw, using a feature flag can also be used to conditionally include dependencies. Regarding lazily updating, I am on board with this. I think we can have it outside of CI, but maintain it on main behind feature flags. We probably want to at least sync it every month or so. So perhaps we can run CI with this feature flag everytime we do a release (just to have a periodic check)? Suggestions welcome. As for dynamically linking, the previously used rust library rust-rocksdb actually uses vendoring and building with clang. So the crate would contain rocksDB source code if I'm not wrong. According to: tikv/tikv#4338, I don't think its worth looking too much into dynamically linking, it may not even produce better compile times... RocksDB + AsyncIt seems like RocksDB will block the thread as it does not have async APIs. The old implementation will use the blocking threadpool in tokio. I think this is acceptable in our use-case. Using the RocksDB backend with EBS v.s. Locally-attached SSDWe should probably try out both. Some people seem to say not to use network attached storage such as EBS due to the performance (see: https://stackoverflow.com/questions/61655220/how-to-store-checkpoint-into-remote-rocksdb-in-apache-flink). However, Veverica itself seems to use EBS in some cases: (https://www.ververica.com/blog/the-impact-of-disks-on-rocksdb-state-backend-in-flink-a-case-study). Further, EBS has higher fault tolerance than local disk and hence we may be able to more justifiably ignore the lack of fault tolerance in our implementation of this state backend. Some additional info about configuring EBS in TiKV's case: https://www.pingcap.com/blog/best-practices-for-tidb-on-aws-cloud/ Anw, this is on the operational/user side of things and has no actual bearing on the RocksDB backend. |
+1. Basically I think we should keep the implementation as simple as possible so there is no need to consider checkpointing/failover.
To keep things simple, we can start with one shared rocksdb instance per CN process (consistent with what Flink does) without using column families to see how things go.
Maintaining a rocksdb state store in the main branch is okay as long as we don't complie and run it very often. But I think inlcuding rocksdb state store in CI will increase the CI running time and cause unneccessary distraction code development, which hurts developer experience, so it doesn't sound like a good idea. I would prefer we only compile it when we really need to run test on it and fix things on demand if it is broken.
This is deployment related and won't affect implementation so I am okay with both. |
Agree, an independent
We can set a daily/weekly Cron GitHub workflow that automatically opens a pull request that tries to merge the comments from |
The goal of having Rocksdb integrated is:
It may reveal some additional insights when it comes to optimizing stream executor + storage, but not 100% sure.
So looking for some low-effort approach to integrate RocksDB into our system as we currently don't know whether it would indeed help a lot.
We used to have
Rocksdb
as one of theStateStore
, but it:StateStore
changes.Now (1) seems to be much less critical as our interface is relatively stable now.
Approach:
main
weekly, bi-weekly, or on demand.Reference: https://github.com/BugenZhao/humrock
The text was updated successfully, but these errors were encountered: