Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry all payment session with actor message #430

Merged
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 13 additions & 17 deletions src/fiber/network.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1454,12 +1454,10 @@ where
.await
.record_payment_fail(&payment_session, error_detail.clone());
if need_to_retry {
let res = self
.try_payment_session(myself, state, payment_session.payment_hash())
.await;
if res.is_err() {
debug!("Failed to retry payment session: {:?}", res);
}
// If this is the first hop error, like the WaitingTlcAck error,
// we will just retry later, return Ok here for letting endpoint user
// know payment session is created successfully
self.register_payment_retry(myself, payment_hash);
} else {
self.set_payment_fail_with_error(
&mut payment_session,
Expand Down Expand Up @@ -1642,14 +1640,10 @@ where
Ok(payment_session) => return Ok(payment_session),
Err(Error::SendPaymentFirstHopError(err, need_retry)) => {
if need_retry {
// If this is the first hop error, like the WaitingTlcAck error,
// If this is the first hop error, such as the WaitingTlcAck error,
// we will just retry later, return Ok here for letting endpoint user
// know payment session is created successfully
myself.send_after(Duration::from_millis(500), move || {
NetworkActorMessage::new_event(NetworkActorEvent::RetrySendPayment(
payment_hash,
))
});
self.register_payment_retry(myself, payment_hash);
return Ok(payment_session);
} else {
return Err(Error::SendPaymentError(err));
Expand All @@ -1660,11 +1654,7 @@ where
// but we need to retry later to let the actor to process failure,
// so that we can make different choice for later try
let payment_hash = payment_data.payment_hash;
myself.send_after(Duration::from_millis(500), move || {
NetworkActorMessage::new_event(NetworkActorEvent::RetrySendPayment(
payment_hash,
))
});
self.register_payment_retry(myself, payment_hash);
debug!("send payment error: {:?}", e);
return Err(e);
}
Expand All @@ -1680,6 +1670,12 @@ where
}
}

fn register_payment_retry(&self, myself: ActorRef<NetworkActorMessage>, payment_hash: Hash256) {
myself.send_after(Duration::from_millis(500), move || {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that even after 0.5 seconds the ChannelUpdate contained in the error message is still not processed. This is because this message would first be saved to the gossip message store and by default the interval to save messages to the store is 20s. https://github.com/contrun/fiber/blob/ea5c1fede9bffd8d3f3fa49d2570aadb971fcb6f/src/fiber/gossip.rs#L1179-L1181 https://github.com/contrun/fiber/blob/ea5c1fede9bffd8d3f3fa49d2570aadb971fcb6f/src/fiber/config.rs#L61-L62 .

In this case, I think we need to notify the gossip actor to start process the messages in the queue immediately (either create a new message like TickImmediately for the gossip actor or rename the ProcessBroadcastMessage to ProcessBroadcastMessageImmediately and change the handler to immediately process messages in the queue). Note that it is still possible that when there are too many messages in the queue this message can still not be processed after 0.5s.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's 60s, not 20s ?

/// The interval to maintain the gossip network, in milli-seconds.
pub const DEFAULT_GOSSIP_NETWORK_MAINTENANCE_INTERVAL_MS: u64 = 1000 * 60;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, there is a chance the message is not processed in 0.5 second, any other better way ? I think we should not make a huge change because of this requirement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e909f92
I make this change, how do you think about it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer changing the action ProcessBroadcastMessage to changing the underlying data structure ChannelUpdate. To be specific, I think add an additional boolean parameter of https://github.com/contrun/fiber/blob/b78812b7584c290563f54d21842eafce0d33d707/src/fiber/gossip.rs#L318-L321 ProcessBroadcastMessage to indicate if we need to process the message immediately will be better. This boolean parameter will intoxicate a few functions/data structures, but its meaning is clearer than ChannelUpdateImmediately.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with 37070e2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use another solution: 513ad0a

NetworkActorMessage::new_event(NetworkActorEvent::RetrySendPayment(payment_hash))
});
}

async fn on_send_payment(
&self,
myself: ActorRef<NetworkActorMessage>,
Expand Down
Loading