Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry all payment session with actor message #430

Merged

Conversation

chenyukang
Copy link
Collaborator

@chenyukang chenyukang commented Jan 1, 2025

Fixes #407

@chenyukang chenyukang marked this pull request as draft January 1, 2025 10:49
@@ -1680,6 +1670,12 @@ where
}
}

fn register_payment_retry(&self, myself: ActorRef<NetworkActorMessage>, payment_hash: Hash256) {
myself.send_after(Duration::from_millis(500), move || {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that even after 0.5 seconds the ChannelUpdate contained in the error message is still not processed. This is because this message would first be saved to the gossip message store and by default the interval to save messages to the store is 20s. https://github.com/contrun/fiber/blob/ea5c1fede9bffd8d3f3fa49d2570aadb971fcb6f/src/fiber/gossip.rs#L1179-L1181 https://github.com/contrun/fiber/blob/ea5c1fede9bffd8d3f3fa49d2570aadb971fcb6f/src/fiber/config.rs#L61-L62 .

In this case, I think we need to notify the gossip actor to start process the messages in the queue immediately (either create a new message like TickImmediately for the gossip actor or rename the ProcessBroadcastMessage to ProcessBroadcastMessageImmediately and change the handler to immediately process messages in the queue). Note that it is still possible that when there are too many messages in the queue this message can still not be processed after 0.5s.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's 60s, not 20s ?

/// The interval to maintain the gossip network, in milli-seconds.
pub const DEFAULT_GOSSIP_NETWORK_MAINTENANCE_INTERVAL_MS: u64 = 1000 * 60;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, there is a chance the message is not processed in 0.5 second, any other better way ? I think we should not make a huge change because of this requirement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e909f92
I make this change, how do you think about it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer changing the action ProcessBroadcastMessage to changing the underlying data structure ChannelUpdate. To be specific, I think add an additional boolean parameter of https://github.com/contrun/fiber/blob/b78812b7584c290563f54d21842eafce0d33d707/src/fiber/gossip.rs#L318-L321 ProcessBroadcastMessage to indicate if we need to process the message immediately will be better. This boolean parameter will intoxicate a few functions/data structures, but its meaning is clearer than ChannelUpdateImmediately.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with 37070e2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use another solution: 513ad0a

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 88.88889% with 1 line in your changes missing coverage. Please review.

Project coverage is 49.82%. Comparing base (a6e8625) to head (ba2d2bc).

Files with missing lines Patch % Lines
src/fiber/network.rs 88.88% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #430      +/-   ##
===========================================
+ Coverage    49.80%   49.82%   +0.01%     
===========================================
  Files           47       47              
  Lines        29806    29805       -1     
===========================================
+ Hits         14845    14850       +5     
+ Misses       14961    14955       -6     
Flag Coverage Δ
unittests 49.82% <88.88%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@chenyukang chenyukang force-pushed the fix-error-handling-issue-407 branch from e909f92 to 37070e2 Compare January 2, 2025 07:18
@chenyukang chenyukang marked this pull request as ready for review January 2, 2025 07:22
@chenyukang chenyukang force-pushed the fix-error-handling-issue-407 branch from 95192e3 to 513ad0a Compare January 2, 2025 11:33
@quake quake merged commit b8e3056 into nervosnetwork:develop Jan 8, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Graph is not updated immediately when a TlcErrData::ChannelFailed is sent from a peer
4 participants