Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore the causes of anomalous meta misses #17473

Closed
Li0k opened this issue Jun 26, 2024 · 1 comment
Closed

Explore the causes of anomalous meta misses #17473

Li0k opened this issue Jun 26, 2024 · 1 comment
Assignees
Labels
type/bug Something isn't working
Milestone

Comments

@Li0k
Copy link
Contributor

Li0k commented Jun 26, 2024

https://grafana.test.risingwave-cloud.xyz/d/EpkBw5W4k/risingwave-dev-dashboard?orgId=1&var-datasource=P2453400D1763B4D9&from=1719327583739&to=1719373395832&var-namespace=reglngvty-20240625-150224&var-instance=&var-pod=All&var-component=All&var-table=All

In a recent test (run all nexmark 10k), I found that the system was experiencing anomalous meta misses during the test, and the increased latency caused by the meta misses would drastically reduce the throughput of the system, resulting in higher barrier latency.

  • latency
image
  • There are several characteristics that can indicate a meta miss
image image
  • object num and sst_meta_size
image image
  • Most meta cache refills are successful
image

In test, the meta cache is only 1.2g, and we believe that a meta cache miss is possible (even if the refill does not fail). We assume a scenario

  1. The operator holds a Pin version, and uses the Pin version to access hummock.
  2. After compaction, update the Pin version of CN through version delta.
  3. The newly arrived version delta triggers meta cache refill
  4. Eviction is triggered due to insufficient cache capacity (old version sst meta is evicted).
    5.1 Meta miss occurs

At this point, we have populated the meta cache on all write paths.

  1. cn sstable upload completed, writer will insert <object_id, meta> into meta cache
  2. after compaction, refiller performs a meta cache refill based on the version delta, inserting <object_id, meta> into the meta cache

Apart from the above meta cache due to eviction, it seems that the system does not have any more meta misses, but I have found that meta misses are encountered before the memory cache is filled, and the meta misses increase over time.

image

The reason for this is hypothesized to be

  1. part of the information is missing when the version delta builds sst_delta_info (bias)
  2. wrong object_id / meta inserted into meta cache
  3. data written to cache is not visible
@Li0k Li0k self-assigned this Jun 26, 2024
@github-actions github-actions bot added this to the release-1.10 milestone Jun 26, 2024
@Li0k Li0k added type/bug Something isn't working and removed type/feature labels Jun 26, 2024
@Li0k
Copy link
Contributor Author

Li0k commented Jun 26, 2024

fix the lack of building sst delta info #17459

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant