-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System freezes after loading kvm module #1
Comments
Interesting. What guest? (Or does it hang without any guest at all?) Do you have a dump? And can you do this on the running system:
|
No guests running -- just a regular boot, doesn't generate a dump, cannot drop to kmdb, tried "dtrace -wn 'tick-1m { panic(); }". If I boot with -B disable-kvm=true, things are stable.. however when I 'rem_drv kvm; add_drv kvm' it freezes shortly thereafter (just like when I boot the BE normally) and I cannot drop to kmdb (this is also a DEBUG kernel) So due to all of that, I set a breakpoint in setup_vmcs_config, and the output is immediately before it returns (hopefully this is sufficient, if not, let me know another point that would be more useful to return the value): { |
Additional data points: set breakpoints on kvm |
.. and it appears during the boot to be trying to unload the kvm module. setting a bp on kvm_detach gets triggered. I stepped over each instruction, and after kvm_arch_hardware_unsetup is called, (or perhaps during), kmdb reports 'single-step stop on miscellaneous trap' and pc is within xc_serv. ::stack shows it's called as xc_serv(0, 0). Doing :c drops it back into xc_serv with the same message, after doing this several times, it drops back into the OS. At this point, the system no longer locks up. (Uneducated guess) is the lockup perhaps a nasty interrupt deadlock triggered by kvm_arch_hardware_unsetup? |
We finally have a box on hand to test this against. Our investigation shows that while the kvm driver is inducing it, there is a problem much deeper in the system. Basically the act of taking a spin lock in cross call context can lead to the behavior you're seeing. As a work around, on a sandy bridge system, consider setting apix_enable=0 in /etc/system or via mdb -kd. The issue is likely in the apix module which was taken in a not quite refined state when the source closed. We're going to be doing further work to determine what's going on there, but it'll be some time before we get there. |
This has been resolved in illumos-joyent. See TritonDataCenter/illumos-joyent@4d86fb7 for the fix. |
CPU is corei5 2400 (sandy bridge)
One time prior to a freeze, I did see 'kvm: NOTICE: unhanded wrmsr: 0x0 data 3000000018' on the console. However have not seen that since. Tried setting a bp in kvm_set_msr_common, and it appears to not be reached in subsequent lockups.
Disabling kvm leaves the system stable, doing an rem_drv kvm; add_drv kvm causes it to lockup shortly thereafter.
This is on a stock illumos debug build (source as of 8/26).
Also experienced similar issues w/ smartos live (though was never able to narrow it down).
The text was updated successfully, but these errors were encountered: