Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 22.03] Cleanup OVS resource on snap removal. #92

Closed
wants to merge 1 commit into from

Conversation

mkalcok
Copy link
Contributor

@mkalcok mkalcok commented Nov 3, 2023

Backport of #91 to 22.03 branch

This change modifies `microovn.switch` service to stop OVS vswitchd
service with `--cleanup` flag. This ensures that when snap is removed,
datapath resources like ports and bridges are cleaned up.

This new behavior is overriden in pre-refresh hook to avoid dataplane
outages during the snap refresh.

(cherry picked from commit 8be9f0b)
Signed-off-by: Martin Kalcok <[email protected]>
@mkalcok mkalcok requested a review from a team as a code owner November 3, 2023 10:09
@mkalcok
Copy link
Contributor Author

mkalcok commented Nov 3, 2023

I'm trying to figure out why this change works in main branch but fails for branch-22.03. I'll just document what I have so far. Although the systemd unit files generated by snap for the microovn.switch service are identical, it seems that vswitch daemon is killed prematurely in branch-22.03.

ovs-vswitchd logs from main branch:

root@ovn1:~# snap stop microovn.switch
Run service command "stop" for services ["switch"] of snap "microovn"                                                                        -
2023-11-03T15:28:52.884Z|00036|bridge|INFO|bridge br-int: deleted interface br-int on port 65534
2023-11-03T15:28:52.969Z|00037|connmgr|INFO|br-int<->unix#0: 12 flow_mods 2 s ago (11 adds, 1 deletes)
Stopped.

ovs-vswitchd logs from branch-22.03:

root@ovn1:~# snap stop microovn.switch
Run service command "stop" for services ["switch"] of snap "microovn"                                                                        /
2023-11-03T15:32:20.958Z|00039|bridge|INFO|bridge br-int: deleted interface br-int on port 65534
2023-11-03T15:32:20.959Z|00001|fatal_signal(urcu13)|WARN|terminating with signal 15 (signal 15)
Stopped.

@fnordahl
Copy link
Member

fnordahl commented Nov 5, 2023

I'm trying to figure out why this change works in main branch but fails for branch-22.03. I'll just document what I have so far. Although the systemd unit files generated by snap for the microovn.switch service are identical, it seems that vswitch daemon is killed prematurely in branch-22.03.

I took a quick look and I found a couple of issues, sharing thoughts below.

  1. I suspect the call to ovs-appctl exit is asynchronous so it may be OVS just does not get around to clean up before control is returned to systemd and the remaining process is forcefully reaped.

    I have a proposal up that runs ovs-ctl stop after the call to ovs-appctl exit --cleanup which would make it synchronous, its lifecycle test run is successful both on main and branch-22.03.

    I'm not entirely sure what exactly makes the difference in behavior between the branches though.

  2. Another thing we need to fix after we figure out this one is the upgrade test, because it is currently consistently failing on branch-22.03.

    I suspect that is happening after we changed the election timer, and on the basis that the upgrade test on main succeeds, there might be a change in behavior between OVSDB Server 2.17 and OVSDB Server 3.2 on restart.

    I believe our choices there are to confirm that there is a difference and consider if a backport is feasible in the OVS package, or augment the test to either be more considerate when performing the upgrade with regards to leadership transitions or just poll/wait for DB cluster recovery after a blanket upgrade (or both to unblock the gate).

@fnordahl
Copy link
Member

fnordahl commented Nov 5, 2023

2. Another thing we need to fix after we figure out this one is the upgrade test, because it is currently [consistently failing on branch-22.03](https://github.com/fnordahl/microovn/actions/runs/6757900150/job/18369004183).
   I suspect that is happening after we changed the election timer, and on the basis that the upgrade test on main succeeds, there might be a change in behavior between OVSDB Server 2.17 and OVSDB Server 3.2 on restart.
   I believe our choices there are to confirm that there is a difference and consider if a backport is feasible in the OVS package, or augment the test to either be more considerate when performing the upgrade with regards to leadership transitions or just poll/wait for DB cluster recovery after a blanket upgrade (or both to unblock the gate).

Having observed doing a blanket upgrade on both the stable and main versions my conclusion is that the upgrade test currently works by chance, I have proposed a fix for this in #96 with backport in #97.

@fnordahl
Copy link
Member

fnordahl commented Nov 6, 2023

Superseded by #97

@fnordahl fnordahl closed this Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants