[BUG] Incorrectly reported used_memory_dataset and used_memory_dataset_perc #1373

martinrvisser · 2024-11-28T13:04:29Z

Describe the bug

Incorrect numbers reported from INFO for used_memory_dataset and used_memory_dataset_perc for the replica only.

Two instances valkey_version:8.0.0
Primary port 8001

Role:master
valkey-cli -p 8001 dbsize
(integer) 1

valkey-cli -p 8001 info memory
# Memory
used_memory:12214912
used_memory_human:11.65M
used_memory_rss:15548416
used_memory_rss_human:14.83M
used_memory_peak:81638856
used_memory_peak_human:77.86M
used_memory_peak_perc:14.96%
used_memory_overhead:11623176
used_memory_startup:1009280
used_memory_dataset:591736
used_memory_dataset_perc:5.28%
allocator_allocated:12371384
allocator_active:12693504
allocator_resident:19349504
allocator_muzzy:0
total_system_memory:24964886528
total_system_memory_human:23.25G
used_memory_lua:31744
used_memory_vm_eval:31744
used_memory_lua_human:31.00K
used_memory_scripts_eval:0
number_of_cached_scripts:0
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:33792
used_memory_vm_total:65536
used_memory_vm_total_human:64.00K
used_memory_functions:184
used_memory_scripts:184
used_memory_scripts_human:184B
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.03
allocator_frag_bytes:322120
allocator_rss_ratio:1.52
allocator_rss_bytes:6656000
rss_overhead_ratio:0.80
rss_overhead_bytes:-3801088
mem_fragmentation_ratio:1.27
mem_fragmentation_bytes:3351432
mem_not_counted_for_evict:12280
mem_replication_backlog:10485904
mem_total_replication_buffers:10539056
mem_clients_slaves:53296
mem_clients_normal:74152
mem_cluster_links:0
mem_aof_buffer:8
mem_allocator:jemalloc-5.3.0
mem_overhead_db_hashtable_rehashing:0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0

Replica port 8010
valkey-cli -p 8010 info memory

Memory

used_memory:12191720
used_memory_human:11.63M
used_memory_rss:16850944
used_memory_rss_human:16.07M
used_memory_peak:39387208
used_memory_peak_human:37.56M
used_memory_peak_perc:30.95%
used_memory_overhead:12268439
used_memory_startup:1009256
used_memory_dataset:18446744073709474897
used_memory_dataset_perc:164961355563008.00%
allocator_allocated:12369896
allocator_active:12759040
allocator_resident:17674240
allocator_muzzy:0
total_system_memory:24964886528
total_system_memory_human:23.25G
used_memory_lua:31744
used_memory_vm_eval:31744
used_memory_lua_human:31.00K
used_memory_scripts_eval:0
number_of_cached_scripts:0
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:33792
used_memory_vm_total:65536
used_memory_vm_total_human:64.00K
used_memory_functions:184
used_memory_scripts:184
used_memory_scripts_human:184B
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.03
allocator_frag_bytes:389144
allocator_rss_ratio:1.39
allocator_rss_bytes:4915200
rss_overhead_ratio:0.95
rss_overhead_bytes:-823296
mem_fragmentation_ratio:1.38
mem_fragmentation_bytes:4677128
mem_not_counted_for_evict:12280
mem_replication_backlog:10539200
mem_total_replication_buffers:10539056
mem_clients_slaves:0
mem_clients_normal:719439
mem_cluster_links:0
mem_aof_buffer:8
mem_allocator:jemalloc-5.3.0
mem_overhead_db_hashtable_rehashing:0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:1375772

To reproduce
Unknown

Expected behavior

Correct values reported

Additional information

Any additional information that is relevant to the problem.

The text was updated successfully, but these errors were encountered:

zvi-code · 2024-11-28T13:51:58Z

My guess is that this is due to the non-atomic nature of memory sampling. zmalloc_used_memory goes over per-thread counter array and sums the values. In case of alloc&free done from different threads, one thread will report positive value and another negative. Depending on the order, this can lead to overall returned value from jemalloc to be lower or higher than actual value (it can even be negative).

Specifically here: We see there where lazyfreed_objects, so these where freed in the context of bio_lazy free thread (not main thread).
We see that used_memory_overhead:12268439 is higher than used_memory:12191720 this indicates this is what happened here (cause overhead can be higher that total in reality).

This is also the reason for the value you get used_memory_dataset:18446744073709474897 because

used_memory - used_memory_overhead = dataset 
=>
12191720-12268439 = 18446744073709474897

I would expect this to be transient issue, can you confirm?

Regarding a fix - we can possibly have internal retry if result is clearly not correct

martinrvisser · 2024-11-28T15:36:10Z

When you say transient, do you mean you would expect this to resolve itself? It is also showing similar odd number in bytes per key.

127.0.0.1:8010> MEMORY STATS
 1) "peak.allocated"
 2) (integer) 39387208
 3) "total.allocated"
 4) (integer) 12224920
 5) "startup.allocated"
 6) (integer) 1009256
 7) "replication.backlog"
 8) (integer) 10539216
 9) "clients.slaves"
10) (integer) 0
11) "clients.normal"
12) (integer) 720335
13) "cluster.links"
14) (integer) 0
15) "aof.buffer"
16) (integer) 8
17) "lua.caches"
18) (integer) 0
19) "functions.caches"
20) (integer) 184
21) "db.0"
22) 1) "overhead.hashtable.main"
    2) (integer) 248
    3) "overhead.hashtable.expires"
    4) (integer) 104
23) "overhead.db.hashtable.lut"
24) (integer) 32
25) "overhead.db.hashtable.rehashing"
26) (integer) 0
27) "overhead.total"
28) (integer) 12269351
29) "db.dict.rehashing.count"
30) (integer) 0
31) "keys.count"
32) (integer) 1
33) "keys.bytes-per-key"
34) (integer) -44431
35) "dataset.bytes"
36) (integer) -44431
37) "dataset.percentage"
38) "164473037914112"
39) "peak.percentage"
40) "31.03779411315918"
41) "allocator.allocated"
42) (integer) 12426040
43) "allocator.active"
44) (integer) 12812288
45) "allocator.resident"
46) (integer) 17993728
47) "allocator.muzzy"
48) (integer) 0
49) "allocator-fragmentation.ratio"
50) "1.0310837030410767"
51) "allocator-fragmentation.bytes"
52) (integer) 386248
53) "allocator-rss.ratio"
54) "1.404411792755127"
55) "allocator-rss.bytes"
56) (integer) 5181440
57) "rss-overhead.ratio"
58) "0.9410425424575806"
59) "rss-overhead.bytes"
60) (integer) -1060864
61) "fragmentation"
62) "1.3851194381713867"
63) "fragmentation.bytes"
64) (integer) 4708024

zvi-code · 2024-11-28T16:12:16Z

Yes, I expect repeated call to show correct numbers. Is it?

martinrvisser · 2024-11-28T17:54:51Z

Confirmed, eventually it came down.

used_memory_dataset:146145
used_memory_dataset_perc:1.28%

zvi-code · 2024-12-03T13:22:39Z

The transient sate i was referring to should be resolved immediately, was that the case? on next invocation usually.
If this is the case, It's by design and the bug can be closed IMO.

It's a valid ask, though, to propose\request a feature that would handle (retry?) in case of temporal unexpected state

hpatro · 2024-12-03T21:48:19Z

The transient sate i was referring to should be resolved immediately, was that the case? on next invocation usually. If this is the case, It's by design and the bug can be closed IMO.

It's a valid ask, though, to propose\request a feature that would handle (retry?) in case of temporal unexpected state

@zvi-code Shouldn't we try and fix it rather than letting it underflow ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Incorrectly reported used_memory_dataset and used_memory_dataset_perc #1373

[BUG] Incorrectly reported used_memory_dataset and used_memory_dataset_perc #1373

martinrvisser commented Nov 28, 2024

zvi-code commented Nov 28, 2024

martinrvisser commented Nov 28, 2024

zvi-code commented Nov 28, 2024

martinrvisser commented Nov 28, 2024

zvi-code commented Dec 3, 2024

hpatro commented Dec 3, 2024 •

edited

Loading

[BUG] Incorrectly reported used_memory_dataset and used_memory_dataset_perc #1373

[BUG] Incorrectly reported used_memory_dataset and used_memory_dataset_perc #1373

Comments

martinrvisser commented Nov 28, 2024

Memory

zvi-code commented Nov 28, 2024

martinrvisser commented Nov 28, 2024

zvi-code commented Nov 28, 2024

martinrvisser commented Nov 28, 2024

zvi-code commented Dec 3, 2024

hpatro commented Dec 3, 2024 • edited Loading

hpatro commented Dec 3, 2024 •

edited

Loading