Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Free memory is wrong on Linux with gcroups v2 #1219

Closed
gi0baro opened this issue Mar 5, 2024 · 10 comments · Fixed by #1220
Closed

Free memory is wrong on Linux with gcroups v2 #1219

gi0baro opened this issue Mar 5, 2024 · 10 comments · Fixed by #1220

Comments

@gi0baro
Copy link

gi0baro commented Mar 5, 2024

system: Linux container based on debian:bookworm-slim
sysinfo version: 0.30.6

The actual cgroup v2 readings deduct all the values in /sys/fs/cgroup/memory.stat from available memory in these lines https://github.com/GuillaumeGomez/sysinfo/blob/master/src/unix/linux/system.rs#L560-L562

This is causing wrong reporting on the available memory, as memory.stat includes values already counted in memory.current, plus values which are not inherent to available memory, eg:

root@testpod:/work# cat /sys/fs/cgroup/memory.max
7864320000
root@testpod:/work# cat /sys/fs/cgroup/memory.current
2481475584
root@testpod:/work# cat /sys/fs/cgroup/memory.stat
anon 2472849408
file 57344
kernel_stack 458752
pagetables 7061504
percpu 0
sock 4096
shmem 0
file_mapped 0
file_dirty 0
file_writeback 0
swapcached 0
anon_thp 0
file_thp 0
shmem_thp 0
inactive_anon 2472730624
active_anon 12288
inactive_file 4096
active_file 53248
unevictable 0
slab_reclaimable 371120
slab_unreclaimable 435552
slab 806672
workingset_refault_anon 0
workingset_refault_file 0
workingset_activate_anon 0
workingset_activate_file 0
workingset_restore_anon 0
workingset_restore_file 0
workingset_nodereclaim 0
pgfault 2503991
pgmajfault 0
pgrefill 0
pgscan 0
pgsteal 0
pgactivate 3
pgdeactivate 0
pglazyfree 0
pglazyfreed 0
thp_fault_alloc 0
thp_collapse_alloc 0

The free memory reported by sysinfo in this case is 425495718 which is clearly wrong.
I'm not sure why all memory.stat contents gets subtracted from the available memory.

@GuillaumeGomez
Copy link
Owner

Is it the free memory of the system in which your container is running by any chance? Also, please show the code you're using. If you want to get cgroup information, you need to use cgroup_limits.

@Dav1dde
Copy link
Contributor

Dav1dde commented Mar 5, 2024

@GuillaumeGomez the code we're using is here: https://github.com/getsentry/relay/blob/be87f135374b2c3ea941a7ec1bd236071d13db13/relay-server/src/services/health_check.rs#L273-L292

        // Use the cgroup if available in case Relay is running in a container.
        if let Some(cgroup) = self.system.cgroup_limits() {
            Memory {
                used: cgroup.total_memory.saturating_sub(cgroup.free_memory),
                total: cgroup.total_memory,
            }
        } else {
            Memory {
                used: self.system.used_memory(),
                total: self.system.total_memory(),
            }
        }

@GuillaumeGomez
Copy link
Owner

Ok so you were using the correct code already and even already wrote a patch. Awesome! :)

@Dav1dde
Copy link
Contributor

Dav1dde commented Mar 5, 2024

Ok so you were using the correct code already and even already wrote a patch. Awesome! :)

Do you know why that code was there in the first place (reading from memory.stat)? From what @gi0baro found it seems like that is wrong, but I assume there was a reason why it was added (tried to git blame but couldn't find anything)?

@GuillaumeGomez
Copy link
Owner

It was originally added in #1024. Modified a little bit in #1058 and the wrong update comes from #1119. Do you think we should still subtract slab_reclaimable, file and shmem?

@Dav1dde
Copy link
Contributor

Dav1dde commented Mar 5, 2024

I really don't know. I tried digging through the cgroup man pages and I have no clue what is correct. I wanna say no, since these are all system values and not necessarily relevant for the cgroup?

Maybe you have an idea @gi0baro ?

What we can do is test the fork and see if it reports similar/same values we get from Kubernetes.

@GuillaumeGomez
Copy link
Owner

It'd be much appreciated. 👍

@gi0baro
Copy link
Author

gi0baro commented Mar 5, 2024

@GuillaumeGomez to my knowledge, and based on the information available at https://docs.kernel.org/admin-guide/cgroup-v2.html, memory.current should already include page cache, in-kernel data structures such as inodes, and network buffers.

I'm not a kernel dev nor expert, but IMHO the only value in memory.stat you may still want to subtract might be file. slab_reclaimable should be already counted in memory.current and shmem should rely on swapped contents, so I won't count that.

@GuillaumeGomez
Copy link
Owner

Thanks for the information! Let's wait for @Dav1dde's tests results and then they'll update (or not depending on the results) their PR, then I'll merge it and make a new release.

Dav1dde added a commit to getsentry/relay that referenced this issue Mar 6, 2024
…3220)

The memory health check is broken for cgroup v2, see
GuillaumeGomez/sysinfo#1219

Switches to a fork until the upstream issue is fixed.
@Dav1dde
Copy link
Contributor

Dav1dde commented Mar 6, 2024

@GuillaumeGomez the results are in, a picture speaks more than a thousand words?

image

This is the max memory usage of all our pods, top is reported from Kubernetes, the bottom is reported via sysinfo crate (with the patch from #1220 applied). It pretty much matches perfectly (purely empirical evidence).

Also big thanks to @gi0baro he actually did the hard work, I just setup the metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants