Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hotfix run each request in bound thread #4080

Closed
wants to merge 4 commits into from

Conversation

jberthold
Copy link
Member

@jberthold jberthold commented Dec 18, 2024

Fixes a problem with recently-introduced thread-local storage in the LLVM backend.
We have to make sure that all foreign calls relating to an LLVM-based term evaluation use the same OS thread, which can only be achieved via bound threads .

This PR

  • adds a flag to the generic RPC server in kore-rpc-types to request processing requests in bound threads
  • kore-rpc-booster uses this flag for bound threads when an LLVM backend library is used

This should protect us against problems related to using thread-local storage in the LLVM backend. Needs to be thoroughly tested before merging, because issues only materialised in proofs with substantial workload and parallel exploration.

Branch HOTFIX-use-bound-threads-for-request-workers is a simple take on this, running the whole worker thread in the server as a bound thread. This version here only runs the single request in a bound thread, and also declares the LLVM calls unsafe because they won't read HS heap data and never call back into Haskell.

When using foreign libraries that use thread-local state, a program must ensure that foreign calls are always made from the same OS thread, by way of making these calls in _bound threads_, see https://downloads.haskell.org/ghc/latest/docs/libraries/base-4.20.0.0-1f57/Control-Concurrent.html#g:8

This PR
* adds a flag to the generic RPC server in `kore-rpc-types` to request running workers in bound threads. The server will use `forkOS` instead of `forkIO` in this case.
* requests running with bound threads when an LLVM backend library is used.

This _should_ protect us against problems related to using thread-local storage in the LLVM backend. Needs to be thoroughly tested before merging, because issues only materialised in proofs with substantial workload and parallel exploration.
@jberthold
Copy link
Member Author

kevm performance test (using v0.1.760, with newer LLVM backend) showed no difference.
kontrol performance tests (noisy)

Test HOTFIX-run-each-request-in-bound-thread time master-64ad8705f time (HOTFIX-run-each-request-in-bound-thread/master-64ad8705f) time
BlockParamsTest.testWarp(uint256)-trace_options2 17.82 23.67 0.752851711026616
ConstructorTest.run_constructor() 11.61 13.42 0.8651266766020864
StoreTest.testLoadNonExistent() 12.41 14.33 0.8660153524075367
StoreTest.testGasLoadColdVM() 14.11 16.26 0.8677736777367773
BytesTypeTest.test_bytes32(bytes32) 11.93 13.6 0.8772058823529412
BlockParamsTest.testCoinBase() 12.26 13.64 0.8988269794721407
kontrol/src/tests/integration/test_foundry_prove.py::test_constructor_with_symbolic_args 59.68 65.15 0.9160399079048349
PlainPrankTest.testFail_startPrank_internalCall() 12.94 14.08 0.9190340909090908
PlainPrankTest.test_startPrank_true() 18.05 19.21 0.939614783966684
SymbolicStorageTest.testFail_SymbolicStorage(uint256) 36.42 38.73 0.9403563129357089
PlainPrankTest.test_startPrank_zeroAddress_true() 18.3 19.45 0.9408740359897173
AssumeTest.test_multi_assume(address,address) 22.1 23.29 0.9489051094890512
PrankTestOrigin.test_origin_setup() 27.58 29.0 0.9510344827586207
SetUpDeployTest.test_extcodesize() 40.57 42.32 0.958648393194707
kontrol/src/tests/integration/test_foundry_prove.py::test_foundry_refute_node 8.02 8.35 0.9604790419161676
AssertTest.test_assert_true_branch(uint256) 26.18 27.14 0.9646278555637435
AssertTest.prove_assert_true() 12.41 11.98 1.0358931552587647
HevmTests.proveFail_require_assert 29.85 28.75 1.0382608695652173
kontrol/src/tests/integration/test_foundry_prove.py::test_foundry_show_with_hex_encoding 5.34 5.12 1.04296875
PrankTestMsgSender.test_msgsender_setup() 29.23 27.8 1.0514388489208633
kontrol/src/tests/integration/test_foundry_prove.py::test_foundry_auto_abstraction 32.98 31.13 1.0594282043045293
PlainPrankTest.test_prank_zeroAddress_true() 20.97 19.78 1.0601617795753284
PlainPrankTest.test_startPrankWithOrigin_true() 19.17 18.08 1.0602876106194692
PlainPrankTest.test_prank_expectRevert() 18.61 17.41 1.0689259046524986
AssertTest.checkFail_assert_false() 17.65 16.48 1.0709951456310678
ChainIdTest.test_chainid_setup() 27.31 25.41 1.0747737111373474
],bytes32))) 14.11 13.12 1.0754573170731707
RollTest.test_roll_setup() 27.39 25.44 1.0766509433962264
CoinBaseTest.test_coinbase_setup() 25.36 23.49 1.0796083439761601
StartPrankTestOrigin.test_startprank_origin_setup() 30.09 27.59 1.0906125407756433
BlockParamsTest.testRoll(uint256) 14.17 12.79 1.1078967943706022
BlockParamsTest.testChainId(uint256) 14.22 12.68 1.1214511041009465
SymbolicStorageTest.testEmptyInitialStorage(uint256) 18.28 15.83 1.1547694251421352
StoreTest.testGasStoreColdVM() 16.56 14.28 1.1596638655462184
AddrTest.test_addr_true()-trace_options1 26.48 21.29 1.2437764208548614
TOTAL 750.16 750.09 1.000093322134677

…ll block"

This reverts commit 207d6b1.

The branch crashed in a proof with 16x parallelism, trying with `Safe` calls again to narrow down why.
@jberthold
Copy link
Member Author

This change unfortunately does not avoid the crash that we observed with the latest LLVM backend. Closing in favour of #4081 which seems to avoid it.

@jberthold jberthold closed this Dec 19, 2024
automergerpr-permission-manager bot pushed a commit that referenced this pull request Dec 20, 2024
Fixes a problem with recently-introduced thread-local storage in the
LLVM backend.
We have to make sure that all foreign calls relating to an LLVM-based
term evaluation use the same OS thread, which can only be achieved [via
bound
threads](https://downloads.haskell.org/ghc/latest/docs/libraries/base-4.20.0.0-1f57/Control-Concurrent.html#g:8)
.

This PR
* adds a flag to the generic RPC server in `kore-rpc-types` to run
request worker threads in bound threads (using `forkOS`)
* `kore-rpc-booster` uses this flag for bound threads when an LLVM
backend library is used
* declares the LLVM backend calls `unsafe` to make executing OS threads
block instead of having new OS threads created for concurrent Haskell
execution (they won't read HS heap data and never call back into
Haskell).

This _should_ protect us against problems related to using thread-local
storage in the LLVM backend. Needs to be thoroughly tested before
merging, because issues only materialised in proofs with substantial
workload and parallel exploration.

PR #4080 uses `runInBoundThread` on the individual request processing
calls instead of running the whole worker thread in the server as a
bound thread. This was not solving the problem.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant