Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add manual reload functionality to mrouted for hotplugged network interfaces #63

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

cguimaraes
Copy link

Summary

This pull request introduces a new feature to the mrouted daemon, allowing users to manually reload the daemon (via mroutectl) in cases where new network interfaces, either virtual or physical, are hotplugged. This reload command functions similarly to the existing restart command, with the key difference being that it preserves existing multicast routing rules, making it less disruptive in running network environments.

Feature Details

  • Manual Reload Capability: Users can now manually trigger a reload in mrouted when a new network interface is added, without losing any established multicast rules.

  • Non-Destructive Behavior: Unlike a full restart, this reload retains the daemon’s knowledge of active routes and existing rules, allowing for seamless integration of new interfaces.

  • Tested with mcjoin: The feature was validated using mcjoin on a dynamic mesh topology with at least 10 nodes running mrouted, ensuring robustness in dynamic, high-node environments.

Testing and Validation

  • Test setup with changing (virtual) mesh topology of 10 nodes.
  • Validated behavior with mcjoin to ensure multicast traffic was managed correctly during manual reloads.

Not In Scope: This change does not automate the reload when interfaces are hotplugged. In future versions, the reload process could be made automatic via:

  • Polling: Periodically checking interfaces every N seconds
  • rtnetlink Events: Listening for kernel events using rtnetlink

Copy link
Owner

@troglobit troglobit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good idea, the PR description is descriptive and the new code looks great ...

... until looking into what the code actually does -- it adds a new function reload_iface(), that's a copy-paste of restart() without free_all_prunes() and free_all_routes().

Meaning, this is essentially the same functionality, but with memory leaks. You should see loss of routing as soon as the routing socket is closed and the kernel wipes all entries from the MRIB and removes all VIFs set up. Sure, the function is quick to reopen the routing socket, and the kernel is probed again for new interfaces, but all state is just recreated, only now with uncreachable rtentrys since rtable is reset to NULL.

To top it all off, the PR describes itself as adding a 'reload' command to mroutectl, but there is no such command. Not even a single change to mroutectl in this PR, not even an update to the man page, so I suspect you are using some other method of accessing the IPC directly.

Now, you're onto something in the PR description, a better way to implement this is adding support for netlink events or polling getifaddrs(). This would however require a big refactor of config_vifs_from_kernel() to support removing VIFs when interfaces are lost and adding VIFs for new interfaces, which is a difficult task to get right since you'd have to go though all existing routes to prune lost interface (VIFs).

In any case, I cannot accept this PR the way it looks now.

@cguimaraes
Copy link
Author

I have rushed a bit with this PR, and it seems I overlooked too many aspects.

But your feedback is very helpful to pin point several things to improve. I will fix this PR accordingly.

As for the automated part, I would leave it for a future-self to tackle it.

@cguimaraes
Copy link
Author

This looks like a good idea, the PR description is descriptive and the new code looks great ...

... until looking into what the code actually does -- it adds a new function reload_iface(), that's a copy-paste of restart() without free_all_prunes() and free_all_routes().

Meaning, this is essentially the same functionality, but with memory leaks. You should see loss of routing as soon as the routing socket is closed and the kernel wipes all entries from the MRIB and removes all VIFs set up. Sure, the function is quick to reopen the routing socket, and the kernel is probed again for new interfaces, but all state is just recreated, only now with uncreachable rtentrys since rtable is reset to NULL.

I have been working a bit on this PR and, this time, taking more time to understand the codebase. I hope that it now resembles a more adequate implementation.

I did some quick tests with the mroutectl show interfaces and the interfaces seem to be added and removed as expected. But, I still need to test it with the changing (virtual) mesh topology of 10 nodes and with mcjoin, as state in the original PR's message.

Also, it seems that line below is segfaulting whenever I kill mrouted with CTRL+C. I need to investigate this a little more.

n = poll(pfd, nhandlers, timeout(n) * 1000);

To top it all off, the PR describes itself as adding a 'reload' command to mroutectl, but there is no such command. Not even a single change to mroutectl in this PR, not even an update to the man page, so I suspect you are using some other method of accessing the IPC directly.

Looks like something was not synced between my machines. Now, it also includes changes in the mroutectl

@cguimaraes cguimaraes changed the title Add manual reload functionality to mrouted for hotplugged network interfaces [WIP] Add manual reload functionality to mrouted for hotplugged network interfaces Oct 26, 2024
@troglobit
Copy link
Owner

Also, it seems that line below is segfaulting whenever I kill mrouted with CTRL+C. I need to investigate this a little more.

n = poll(pfd, nhandlers, timeout(n) * 1000);

The timer handling has been completely refactored on the master branch, fixing this issue among other things, see #56

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants