CI: `blockchain::esplora::bdk_blockchain_tests::test_sync_stop_gap_20` occationally fails #650

evanlinjin · 2022-07-02T19:05:21Z

Describe the bug

CI: blockchain::esplora::bdk_blockchain_tests::test_sync_stop_gap_20 occationally fails.

The command that fails

$ cargo test --no-default-features --features test-esplora,use-esplora-ureq,verify esplora::bdk_blockchain_tests

Occurences

https://github.com/bitcoindevkit/bdk/runs/7163681328?check_suite_focus=true

The text was updated successfully, but these errors were encountered:

rajarshimaitra · 2022-07-03T10:33:14Z

I have been witnessing few occasional failures in blockchain tests in my local too.. Seems like some kind of resource blocking thingy happening somewhere?? I was seeing "resource unavailable" errors few days ago when running blockchian tests all together.. Which I can't seem to reproduce anymore.. Could be something related??

From the error message it seems something got timed out while waiting for some stuffs..

evanlinjin · 2022-07-03T16:27:35Z

I have been witnessing few occasional failures in blockchain tests in my local too.. Seems like some kind of resource blocking thingy happening somewhere?? I was seeing "resource unavailable" errors few days ago when running blockchian tests all together.. Which I can't seem to reproduce anymore.. Could be something related??

From the error message it seems something got timed out while waiting for some stuffs..

Same here. Would it make sense to implement some retry logic/exponential backoff?

We can keep track of the errors and where they occur here.

rajarshimaitra · 2022-07-04T09:05:31Z

Same here. Would it make sense to implement some retry logic/exponential backoff?

We do have an exponential pole at electrum calls

bdk/src/testutils/blockchain_tests.rs

Lines 345 to 359 in 063d51f

    
           fn exponential_backoff_poll<T, F>(mut poll: F) -> T 
        
           where 
        
               F: FnMut() -> Option<T>, 
        
           { 
        
               let mut delay = Duration::from_millis(64); 
        
               loop { 
        
                   match poll() { 
        
                       Some(data) => break data, 
        
                       None if delay.as_millis() < 512 => delay = delay.mul_f32(2.0), 
        
                       None => {} 
        
                   } 
        
                   std::thread::sleep(delay); 
        
               } 
        
           }

To me it seems for some reason the electrum calls isn't returning and some resources are not being free??

The error seems to disappear in single test runs.. Only when I run all the blockchian tests together..

afilini · 2022-07-04T09:48:48Z

This looks like a bug in electrsd:

[2022-07-02T18:30:55Z ERROR electrsd] early exit with: ExitStatus(unix_wait_status(134))

Maybe it doesn't handle too much load well. I'll see if I can somehow attach a debugger and reproduce.

evanlinjin · 2022-07-04T13:22:38Z

This is a one-off failure, not sure if it is related.

failures:

---- blockchain::esplora::bdk_blockchain_tests::test_double_spend stdout ----
thread 'blockchain::esplora::bdk_blockchain_tests::test_double_spend' panicked at 'called `Result::unwrap()` on an `Err` value: JSON(Error("expected value", line: 1, column: 1))', src/testutils/blockchain_tests.rs:65:56
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    blockchain::esplora::bdk_blockchain_tests::test_double_spend

afilini · 2022-07-04T14:50:13Z

Maybe it's related to this? bitcoindevkit/rust-electrum-client#67

evanlinjin · 2022-07-04T16:38:35Z

failures:

---- blockchain::esplora::bdk_blockchain_tests::test_sync_address_index_should_not_decrement stdout ----
thread 'blockchain::esplora::bdk_blockchain_tests::test_sync_address_index_should_not_decrement' panicked at 'called `Result::unwrap()` on an `Err` value: EarlyExit(ExitStatus(unix_wait_status(134)))', src/testutils/blockchain_tests.rs:36:75
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    blockchain::esplora::bdk_blockchain_tests::test_sync_address_index_should_not_decrement

test result: FAILED. 256 passed; 1 failed; 5 ignored; 0 measured; 0 filtered out; finished in 6.41s

evanlinjin · 2022-07-05T02:26:43Z

 failures:
error: test failed, to rerun pass '--lib'
---- blockchain::esplora::bdk_blockchain_tests::test_tx_chain stdout ----
thread 'blockchain::esplora::bdk_blockchain_tests::test_tx_chain' panicked at 'called `Result::unwrap()` on an `Err` value: EarlyExit(ExitStatus(unix_wait_status(134)))', src/testutils/blockchain_tests.rs:36:75
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
    blockchain::esplora::bdk_blockchain_tests::test_tx_chain

evanlinjin · 2022-07-12T23:01:27Z

---- blockchain::electrum::test::bdk_blockchain_tests::test_sync_bump_fee_basic stdout ----
thread 'blockchain::electrum::test::bdk_blockchain_tests::test_sync_bump_fee_basic' panicked at 'called `Result::unwrap()` on an `Err` value: IOError(Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" })', src/testutils/blockchain_tests.rs:65:56
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

evanlinjin · 2022-07-13T09:29:47Z

This one is the craziest yet: https://gist.github.com/evanlinjin/3079251115e64edc355f46d0d63edb1e

rajarshimaitra · 2022-07-13T14:20:51Z

Yes.. I have been seeing this one in my local..

evanlinjin · 2022-07-13T23:49:26Z

Yes.. I have been seeing this one in my local..

I mean if the error states "Resource TEMPORARILY unavailable", I think we can safely wrap the calls with exponential back-off logic. I did see a function in the tests for exponential back-off, so maybe we can reuse that.

afilini · 2022-07-14T10:15:54Z

My guess: the tcp send buffer gets filled up because core can't process all those requests at once. Increasing the number of threads would probably help.

Source https://linux.die.net/man/2/send:

When the message does not fit into the send buffer of the socket, send() normally blocks, unless the socket has been placed in nonblocking I/O mode. In nonblocking mode it would fail with the error EAGAIN or EWOULDBLOCK in this case

Whenever the transport errors with "WouldBlock" we wait a bit and retry the call. Related to bitcoindevkit#650, should at least fix the errors with RPC

notmandatory · 2022-09-10T18:45:11Z

I ran into this issue testing bitcoindevkit/bdk-cli#102 with the new regtest-bitcoin feature which starts up a local bitcoind and connects to it via the RPC client, but the error only seemed to happen on my MacOS system, the CI tests all pass on Ubuntu Linux. While debugging I noticed in the bitcoind logs a list of RPC requests to "Update existing descriptor: raw(...)" seems to be the default 100 address look ahead, so I think the requests are all getting successfully delivered but the RPC client connection disconnects always after 5 seconds and throws the SocketError "Resource temporarily unavailable". I'll try making the bitcoind server responded faster (as @afilini suggested with more threads), and separately see if I can make the RPC client timeout longer and report back.

notmandatory · 2022-09-10T20:10:51Z

I re-tried my test with more rpcthreads set on bitcoind with conf set to via "-rpcthread=", I tired 20 and 50, didn't seem to make a difference. I haven't find a way yet to change the RPC call socket timeout.

notmandatory · 2022-09-10T20:25:40Z

Probably best to disregard my above RPC problems since the title of this issue is really about esplora. I'll open a new issue for my RPC macos problem.

notmandatory · 2023-03-02T17:44:40Z

Could this issue be related to the default number for file descriptors running out? This is mentioned for core here: JSON-RPC-interface limitations.

rajarshimaitra · 2023-03-26T04:49:53Z

As per https://docs.oracle.com/cd/E19476-01/821-0505/file-descriptor-requirements.html, the default limit for file descriptors per process is 1024. I am not fully sure, but I think we are not hitting that many RPC requests.

Another pointer from LDK team was, it should just redo the call again if it hits the error. lightningdevkit/rust-lightning#2033 (comment)

I will try with this on the new RPC example and report back..

evanlinjin added the bug Something isn't working label Jul 2, 2022

notmandatory added this to BDK Maintenance Jul 4, 2022

notmandatory moved this to Todo in BDK Maintenance Jul 4, 2022

afilini mentioned this issue Aug 9, 2022

Retry failed RPC calls on error #707

Closed

9 tasks

afilini added a commit to afilini/bdk that referenced this issue Aug 9, 2022

Retry failed RPC calls on error

9aa08ed

Whenever the transport errors with "WouldBlock" we wait a bit and retry the call. Related to bitcoindevkit#650, should at least fix the errors with RPC

afilini added a commit to afilini/bdk that referenced this issue Aug 10, 2022

Retry failed RPC calls on error

ccb951a

Whenever the transport errors with "WouldBlock" we wait a bit and retry the call. Related to bitcoindevkit#650, should at least fix the errors with RPC

evanlinjin pushed a commit to evanlinjin/bdk that referenced this issue Sep 1, 2022

Retry failed RPC calls on error

09e5e61

Whenever the transport errors with "WouldBlock" we wait a bit and retry the call. Related to bitcoindevkit#650, should at least fix the errors with RPC

evanlinjin mentioned this issue Sep 1, 2022

Retry failed RPC calls on error #741

Closed

4 tasks

rajarshimaitra mentioned this issue Sep 10, 2022

Satoshi's Calculator. bitcoindevkit/bdk-cli#102

Merged

6 tasks

This was referenced Feb 10, 2023

RPC sync failure #859

Open

[RPC] Ignore importdescriptor error #862

Closed

tnull mentioned this issue Feb 15, 2023

Make test_esplora_syncs more robust lightningdevkit/rust-lightning#2033

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: `blockchain::esplora::bdk_blockchain_tests::test_sync_stop_gap_20` occationally fails #650

CI: `blockchain::esplora::bdk_blockchain_tests::test_sync_stop_gap_20` occationally fails #650

evanlinjin commented Jul 2, 2022

rajarshimaitra commented Jul 3, 2022

evanlinjin commented Jul 3, 2022 •

edited

Loading

rajarshimaitra commented Jul 4, 2022

afilini commented Jul 4, 2022

evanlinjin commented Jul 4, 2022

afilini commented Jul 4, 2022

evanlinjin commented Jul 4, 2022

evanlinjin commented Jul 5, 2022

evanlinjin commented Jul 12, 2022

evanlinjin commented Jul 13, 2022

rajarshimaitra commented Jul 13, 2022

evanlinjin commented Jul 13, 2022

afilini commented Jul 14, 2022

notmandatory commented Sep 10, 2022

notmandatory commented Sep 10, 2022

notmandatory commented Sep 10, 2022 •

edited

Loading

notmandatory commented Mar 2, 2023

rajarshimaitra commented Mar 26, 2023 •

edited

Loading

CI: blockchain::esplora::bdk_blockchain_tests::test_sync_stop_gap_20 occationally fails #650

CI: blockchain::esplora::bdk_blockchain_tests::test_sync_stop_gap_20 occationally fails #650

Comments

evanlinjin commented Jul 2, 2022

rajarshimaitra commented Jul 3, 2022

evanlinjin commented Jul 3, 2022 • edited Loading

rajarshimaitra commented Jul 4, 2022

afilini commented Jul 4, 2022

evanlinjin commented Jul 4, 2022

afilini commented Jul 4, 2022

evanlinjin commented Jul 4, 2022

evanlinjin commented Jul 5, 2022

evanlinjin commented Jul 12, 2022

evanlinjin commented Jul 13, 2022

rajarshimaitra commented Jul 13, 2022

evanlinjin commented Jul 13, 2022

afilini commented Jul 14, 2022

notmandatory commented Sep 10, 2022

notmandatory commented Sep 10, 2022

notmandatory commented Sep 10, 2022 • edited Loading

notmandatory commented Mar 2, 2023

rajarshimaitra commented Mar 26, 2023 • edited Loading

CI: `blockchain::esplora::bdk_blockchain_tests::test_sync_stop_gap_20` occationally fails #650

CI: `blockchain::esplora::bdk_blockchain_tests::test_sync_stop_gap_20` occationally fails #650

evanlinjin commented Jul 3, 2022 •

edited

Loading

notmandatory commented Sep 10, 2022 •

edited

Loading

rajarshimaitra commented Mar 26, 2023 •

edited

Loading