Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: prefetch data from S3 #2860

Closed
hzxa21 opened this issue May 27, 2022 · 9 comments
Closed

Performance: prefetch data from S3 #2860

hzxa21 opened this issue May 27, 2022 · 9 comments
Labels
type/enhancement Improvements to existing implementation.

Comments

@hzxa21
Copy link
Collaborator

hzxa21 commented May 27, 2022

To improve performance and reduce cost (S3 charges money per GET request), it is time to think about a better prefetch strategy instead of only reading one block at a time from S3.

For compactor, I think we can always prefetch the whole SST since we will read all its data anyway. #2630 has implemented simple prefetch for compaction.

For compute node, I think we need some heuristic to do the prefetch (e.g. size based with some exponential factors). Can the prefetch strategy adjust to the workload?

@hzxa21 hzxa21 added the type/enhancement Improvements to existing implementation. label May 27, 2022
@fuyufjh
Copy link
Member

fuyufjh commented May 27, 2022

For small point queries, I think LRU is enough, and it may be hard to predict incoming data or user's query 🤔

So we may start from big range queries?

@Little-Wallace
Copy link
Contributor

We can not fetch the whole SST because it may hold the whole data of this compaction task in memory....

@Little-Wallace
Copy link
Contributor

for big range queries, it's hard for hummock to judge how much data we shall prefetch....
If the executor could know how large range it will read, it can pass prefetch flag to hummock and hummock could read this data

@lmatz
Copy link
Contributor

lmatz commented May 27, 2022

Another form of prefetch:
if an SST/block named X that is going to be compacted has already been in the cache, then after compaction, the SST/blocks that overlap SST/block X can be used directly to replace X in the cache with some heuristics.

@hzxa21
Copy link
Collaborator Author

hzxa21 commented May 27, 2022

Another form of prefetch: if an SST/block named X that is going to be compacted has already been in the cache, then after compaction, the SST/blocks that overlap SST/block X can be used directly to replace X in the cache with some heuristics.

Exactly. What I am thinking of is: after compaction, block cache (and disk-based secondary cache in the future) is refilled by some heuristics while the blocks of pre-compacted SSTs are still usable until the prefetch finish. In this way, we can reduce cache miss caused by compaction.

@hzxa21
Copy link
Collaborator Author

hzxa21 commented May 27, 2022

We can not fetch the whole SST because it may hold the whole data of this compaction task in memory....

Yes, we should bound the working set of compactor to avoid OOM but try to prefetch as much as possible.

@Little-Wallace
Copy link
Contributor

Another form of prefetch: if an SST/block named X that is going to be compacted has already been in the cache, then after compaction, the SST/blocks that overlap SST/block X can be used directly to replace X in the cache with some heuristics.

we can refill this cache just after we pinned a new version.

@jon-chuang
Copy link
Contributor

jon-chuang commented Jun 1, 2022

If the executor could know how large range it will read, it can pass prefetch flag to hummock and hummock could read this data

There is actually count min sketch for this to do so space efficiently...

Feels like this is related to prefix bloom filter as well. And we need e.g. table schema for the variable length prefix here too.

@skyzh
Copy link
Contributor

skyzh commented Jul 7, 2022

If we can enable prefetch, we can set block size back to 64KB. Related PR: #3463

@xxchan xxchan closed this as completed May 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Improvements to existing implementation.
Projects
None yet
Development

No branches or pull requests

7 participants