-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added retry for requests used by EVCs #491
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, the retries + set an error state is pretty great. But, there are other things that need to be adapted or changed, check out my comments to see if they make sense.
Also, changelog hasn't been updated, unit tests failing, happy to provide a partial code review though.
New update.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionality-wise it's almost there, and nicely done eliminating other tech debt along the way like replacing requests with httpx. There are some minor points to refactor and or fix, check out my new comments.
Other than that, I think it's worth discussing whether status_error
should also include the high level action that needs to taken, which would be very valuable for users to fix themselves by calling endpoints and/or more automation with consistency check that we can perform the "recovery" action, think about it, let me know what you think and if you can cover that too, if it's worth it. (feel free not to cover unit test for that part until it's sorted out)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Alopalao, I'll recommend that you split the part of setting the error_status
in a new PR, that's great as it is, but I'd like that one to evolve and be ready to also integrate well with consistency check eventually, and we're in the last days of the release, I don't we'll be able to sort it as thoroughly as we'd like to. That said, let's only handle retries here, and leave the rest of the behavior as it is, so that's already and improved, and then you can tackle and further specify the the rest of the error handling in next issue in a next version. At the end of the day, with the right concurrency limit this shouldn't be much of a concern so it's OK to delay this part in an exchange to make it gets as polished and as aligned with other changes, otherwise we might end up having to refactor again some values which would result in more rework and db scripts, OK? So, map another issue and we'll continue with that in a next release, proceed with finish this PR and the other 2024.1
issues, thanks.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Alopalao, great improvements, and I'm glad how this is evolving gradually. I'll leave it pre-approved. Before merging:
- Augment to parametrize force=True and use it during failover clean up
Finally:
- Map a new issue just so the rest of error handling that you started with
error_status
can be further specified and make official in a future release
Let's ship it @Alopalao, feel free to merge. |
Closes #483
Closes #184
Summary
Creation/deletion of flows error by Service Unavailable can't be stopped since number of requests can go to infinite.
This PR adds two crucial approaches to try to minimize Service Unavailable error results:
Add retry to requests made by EVCs (
flow_manager
andpathfinder
)The retry is applied with a random timing of 5 to seconds in three instances. Because Service Unavailable happens when multiple requests are send in a short period of time, the randomness is meant to help with concurrent requests.
Addingerror_status
to EVCsExtra measures
{"circuit_id": "08352c061d3447", "deployed": false}
.{'redeployed': false}
- Mechanic withevc.old_path
has been changed toevc.old_paths
which is now a list additionally containing all the old paths that failed to be deleted. This error will not change the state oferror_status
nor disable nor deactivate the EVC. Another eventkytos/mef_eline.cleanup_evcs_old_path
will be sent containing the failed EVCs. The handling of this event is nowdynamic_single
.EVCPathNotDeletedand EVCPathNotInstalled. These are raised the the methods that catch FlowModException.requests
in favor ofhttpx
Local Tests
Stress tested with 100 to 140 EVC concurrent deletions and creations. These add up to ~2000 EVCs per test. Tried with 150 requests but at that point
mef_eline
was failing which did not create any issues just lost requests.api_concurrency_limit
was not changed.End-to-End Tests