Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uhyve is very slow in Githubs CI #790

Open
jounathaen opened this issue Nov 12, 2024 · 7 comments
Open

Uhyve is very slow in Githubs CI #790

jounathaen opened this issue Nov 12, 2024 · 7 comments

Comments

@jounathaen
Copy link
Member

Example run:

https://github.com/hermit-os/kernel/actions/runs/11801739625/job/32875789502

  • Takes ~2:30 Min with 1 core
  • Takes ~11:00 Min with 4 cores
@n0toose
Copy link
Member

n0toose commented Nov 20, 2024

Uploaded the job-longs.txt for posterity and because GitHub requires you to login to view the logs:

logs_30798294207.zip
job-logs.txt

@n0toose
Copy link
Member

n0toose commented Nov 21, 2024

After removing the following line on my system (with its fair share of environment variables), I managed to gain a performance boost of 10 seconds (before, it was around ~14) with 20k-27k less hypercalls (depending on the log level).

This probably warrants some further investigation:

uhyve/src/vm.rs

Line 323 in d4bc7da

.envs(env::vars())

@mkroening
Copy link
Member

In release mode? This would be most surprising. 🤔

I'll have a look at the kernel to see if anything jumps on me.

@mkroening
Copy link
Member

mkroening commented Nov 22, 2024

Statistics from running a release-profile rusty_demo on hermit-os/kernel@430da84:

With env in FDT:

VM exits:                       total     cpu0 
  Hypercall(Exit):                  1        1 
  Hypercall(FileWrite):           385      385 
  Hypercall(Uart):              21692    21692 
  PCIRead:                       1043     1043 
  PCIWrite:                      1067     1067 
CPU runtimes:
  cpu 0: 1.127073035s

Without env in FDT:

Run statistics:
VM exits:                       total     cpu0 
  Hypercall(Exit):                  1        1 
  Hypercall(FileWrite):           150      150 
  Hypercall(Uart):               3252     3252 
  PCIRead:                       1043     1043 
  PCIWrite:                      1067     1067 
CPU runtimes:
  cpu 0: 534.278001ms

(@jounathaen it would be nice if the statistics would be sorted automatically, also there is a trailing space)

These numbers do make sense, since the output with the environment is a 21,000 characters longer than without, and we currently do one hypercall per character if I am not mistaken.

This does not explain the slowdown in CI, though, since the CI spends most of the time with number crunching and not with UART output.

Edit: for comparison, QEMU with KVM for rusty_demo without environment variables takes this long:

________________________________________________________
Executed in    1.33 secs      fish           external
   usr time    0.92 millis  916.00 micros    0.00 millis
   sys time  117.51 millis  424.00 micros  117.09 millis

@jounathaen
Copy link
Member Author

We definitely have to implement the buffered serial hypercall in Hermit.

(@jounathaen it would be nice if the statistics would be sorted automatically, also there is a trailing space)

By which measure do you want to sort them? I think alphabetically makes most sense to have the best comparability.

@mkroening
Copy link
Member

By which measure do you want to sort them? I think alphabetically makes most sense to have the best comparability.

Yeah, alphabetically is what I was thinking of. I had to sort them manually, the order was non-deterministic.

@mkroening
Copy link
Member

Firecracker is much faster on the same GitHub-hosted runner (dev profile):

Uhyve:

Measured time for 100ms sleep: 184.914ms

Calculating Pi (Sequential): 469.843ms

Calculating Pi (Parallel):   2.292218s

Matrix multiplication
seq row-major:	64x64 matrix: 6.1972733 s
seq z-order:	64x64 matrix: 5.092721 s
par z-order:	64x64 matrix: 5.624933 s
par strassen:	64x64 matrix: 5.5266623 s
speedup: 0.91x

Laplace iterations
1000 iterations: 170.188916s (residual: 0.0000000000000000000044790090308916814)

Firecracker:

Measured time for 100ms sleep: 100.147ms

Calculating Pi (Sequential): 836µs

Calculating Pi (Parallel):   3.06ms

Matrix multiplication
seq row-major:	64x64 matrix: 0.011902 s
seq z-order:	64x64 matrix: 0.009583 s
par z-order:	64x64 matrix: 0.009726 s
par strassen:	64x64 matrix: 0.009326 s
speedup: 0.99x

Laplace iterations
1000 iterations: 117.193ms (residual: 0.0000000000000000000044790090308916814)

Uhyve SMP:

Measured time for 100ms sleep: 206.062ms

Calculating Pi (Sequential): 534.138ms

Calculating Pi (Parallel):   5.323512s

Matrix multiplication
seq row-major:	64x64 matrix: 6.868419 s
seq z-order:	64x64 matrix: 5.602098 s
par z-order:	64x64 matrix: 3.658734 s
par strassen:	64x64 matrix: 4.7791476 s
speedup: 1.53x

Laplace iterations
1000 iterations: 863.561143s (residual: 0.0000000000000000000044790090308916814)

Firecracker SMP:

Measured time for 100ms sleep: 100.176ms

Calculating Pi (Sequential): 836µs

Calculating Pi (Parallel):   6.463ms

Matrix multiplication
seq row-major:	64x64 matrix: 0.011897 s
seq z-order:	64x64 matrix: 0.009555 s
par z-order:	64x64 matrix: 0.007463 s
par strassen:	64x64 matrix: 0.007992 s
speedup: 1.28x

Laplace iterations
1000 iterations: 1.385644s (residual: 0.0000000000000000000044790090308916814)

Maybe our interrupt handling is way slower?

For the record, these are the KVM capabilities, but I have not looked into whether they might be relevant here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants