You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a recent test (run all nexmark 10k), I found that the system was experiencing anomalous meta misses during the test, and the increased latency caused by the meta misses would drastically reduce the throughput of the system, resulting in higher barrier latency.
latency
There are several characteristics that can indicate a meta miss
object num and sst_meta_size
Most meta cache refills are successful
In test, the meta cache is only 1.2g, and we believe that a meta cache miss is possible (even if the refill does not fail). We assume a scenario
The operator holds a Pin version, and uses the Pin version to access hummock.
After compaction, update the Pin version of CN through version delta.
The newly arrived version delta triggers meta cache refill
Eviction is triggered due to insufficient cache capacity (old version sst meta is evicted).
5.1 Meta miss occurs
At this point, we have populated the meta cache on all write paths.
cn sstable upload completed, writer will insert <object_id, meta> into meta cache
after compaction, refiller performs a meta cache refill based on the version delta, inserting <object_id, meta> into the meta cache
Apart from the above meta cache due to eviction, it seems that the system does not have any more meta misses, but I have found that meta misses are encountered before the memory cache is filled, and the meta misses increase over time.
The reason for this is hypothesized to be
part of the information is missing when the version delta builds sst_delta_info (bias)
wrong object_id / meta inserted into meta cache
data written to cache is not visible
The text was updated successfully, but these errors were encountered:
https://grafana.test.risingwave-cloud.xyz/d/EpkBw5W4k/risingwave-dev-dashboard?orgId=1&var-datasource=P2453400D1763B4D9&from=1719327583739&to=1719373395832&var-namespace=reglngvty-20240625-150224&var-instance=&var-pod=All&var-component=All&var-table=All
In a recent test (run all nexmark 10k), I found that the system was experiencing anomalous meta misses during the test, and the increased latency caused by the meta misses would drastically reduce the throughput of the system, resulting in higher barrier latency.
In test, the meta cache is only 1.2g, and we believe that a meta cache miss is possible (even if the refill does not fail). We assume a scenario
5.1 Meta miss occurs
At this point, we have populated the meta cache on all write paths.
Apart from the above meta cache due to eviction, it seems that the system does not have any more meta misses, but I have found that meta misses are encountered before the memory cache is filled, and the meta misses increase over time.
The reason for this is hypothesized to be
The text was updated successfully, but these errors were encountered: