-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework breakpoint mechanism #140
Comments
Hey @rageagainsthepc, We always get one of the following two errors:
Can you outline what needs to be done to solve this? What is the quick and dirty solution? What is the solid solution? We might be able to work on this and contribute. Thanks, |
I am not entirely sure whether those errors are linked to a multiple vCPU setup but it's possible (and in the case of the second one very likely). Your assumptions are correct. Right now we only support a single vCPU. The problem is that we modify pages directly with INT3 writes. When multiple vCPUs hit the same breakpoint more or less simultaneously our breakpoint mechanism will inevitably break because multiple callbacks are trying to modify the same memory location. Usually the best way to solve this is by leveraging EPT:
If am not mistaken DRAKVUF does this in a similar way and there might also be a libvmi example if you are looking for an example on how to approach this. If you are looking for a quick and dirty solution, I recommend using a single vCPU ;) |
Hey @rageagainsthepc, A breakpoint hit is supposed to be handled as follows:
This works flawlessly with only one vCPU (nothing else is running, the only vCPU is stopped or single-stepping during the whole process), and it works sometimes with other vCPUs active. However, it goes wrong when a context switch happens on another vCPU during the execution of the SW interupt handler. The context switch handler (triggered by CR3 write) goes through all breakpoints and checks for each if it is currently active and if it is supposed to be active after the context switch (there are process-specific breakpoints). If there is a discrepancy between the two, the breakpoint state is toggled accordingly. Now, when the SW interrupt handler restores the original instruction, it also sets the breakpoint state to disabled. In our minimal SmartVMI example (SmartVMI without plugins, connected to a Windows guest), there are two global (i.e., not process-specific breakpoints), which should always be active. So, the context switch handler, which runs directly after the SW interrupt handler (and problematically, sometimes, before the single-stepping has been executed), sees that the relevant breakpoint should be active and that is currently is not, and, in consequence, re-enables the breakpoint. This also re-inserts the INT3 instruction. The single-stepping now executes the INT3, resulting in the SW interrupt handler running again, trying to register a single-step handler, causing an exception when the check determines there already is another handler registered. The other exception "Breakpoint originalValue @ xxx is already an INT3 breakpoint" has nothing to do with the multicore setup, it occurred when restarting SmartVMI after a crash (due to the second callback in our case), because the INT3 instructions were not cleaned up and still present. We improved upon the existing approach by implementing a "TempDisable" state for the breakpoint and also extended the logic for determining whether a process-specific needs to be active to work with multiple vCPUs. This fixes the existing problems and allows to run with multiple vCPUs. However, it is of course still limited by the approach itself and it is possible that other vCPU fly by a breakpoint when it is currently disabled for single-stepping. Does it make sense to try and upstream this "does not crash but is also not 100% reliable fo multiple vCPUs"-solution or should we rather wait with upstream contributions for an upcoming EPT/altp2m solution? |
Hi @lbeierlieb, Thanks for the thorough explanation. At that moment our team is not actively developing this software any further. This might change in the future, but as of right now no work is being done besides the occasional housekeeping. Therefore I would say that it makes perfectly sense to upstream these improvements. Especially since we would probably try to make it possible to switch between breakpoint mechanism because some hypervisors (e.g. KVMi) still have trouble handling multiple sets of EPTs. Thanks in advance for your effort. 🙏 |
Thank you for the quick response and the information! |
Now that we have added the required funtionality to
KVMi
we can rework the breakpoint API in order to incorporate SLAT switches. This would enable us to use guests with more than onevCPU
. It would also increase performance in situations where the memory page that contains a breakpoint is read by the guest.The text was updated successfully, but these errors were encountered: