Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

additional hooks #4

Open
lunixbochs opened this issue May 30, 2022 · 1 comment
Open

additional hooks #4

lunixbochs opened this issue May 30, 2022 · 1 comment

Comments

@lunixbochs
Copy link

Cool project! With a bit of tooling on top, I'll probably be able to replace many of my use cases for usercorn with a tool that works on more complex targets.

There are a few hooks I've found valuable to get a complete picture with this kind of tracing:

  • syscall (# + arg registers) - you can just emit a trace event in do_syscall()
  • mmap / munmap / mprotect (if a file is mmaped, I'd like enough information to best-effort mirror the mapping into a tracing tool. filename+offset may be sufficient for most cases? I'd also likely want to know about the initial mappings of the interpreter and executable.)
  • simple register change (e.g. r0, eax, etc)
  • special register change (e.g. MSR, SIMD)

Register change tracking is the reason I've wanted something more like cannoli for a long time - it would be so much faster to copy individual register writes to a buffer within the JIT, than what I was doing before (diff the register file repeatedly from a C helper)

@gamozolabs
Copy link
Collaborator

gamozolabs commented May 31, 2022

Syscall should be easy to add. Mmap as well (although arguably we could do that on the consumer side by parsing syscalls, albeit would have to "recover" information so probably wouldn't be great).

Registers are planned, although they're a bit of a pain as it requires per-architecture code for every single QEMU target. Unfortunately there isn't a great way of tracing registers, and they do not have a defined location when running a JIT, rather, each target defines their own structure for register layouts (env pointer in AREG0). I haven't really figured out the design I would do here as it's just really messy. Also, are things like register changes in a syscall handler counted as register changes? If so, then it becomes nearly impossible (thousands of lines of code changes that have to be permanently maintained with QEMU). I think my current plans are just to make it so you can instrument instructions to "dump register state", rather than tracking diffs. Eg. for recovering args to a syscall/libc function/whatever.

Special registers take the above issues to a whole new level, and I just can't really see that being a thing. Every architecture has completely different ways of doing "special registers", and even just what a "special register" is.

I think if I really wanted to kinda hit everything with one change, it could be an unsafe hook that takes in a user-defined number of bytes to literally memcpy() from AREG0 (pointer to env) into the trace. This would be pretty gross, but in theory would allow you to save "all of the CPU state" on a hook. Maybe I could even do a bitmap of what bytes to copy (that then gets packed when saved). That'd be pretty easy to auto generate good code for, that would be relatively performant.

The programming goal of this project is largely to just be a high-performance data stream out of QEMU. I really want to keep it that way as to make it a stable code base that isn't tracking a bunch of target-specific hooks at all times in QEMU. I'd rather outsource that to a library that processes traces. I'm always open to getting information out that is otherwise unobtainable, but if it's obtainable in a "generic" sense, I think I'd rather reconstruct it on the processing end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants