Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Control of link up events by interruptions #136

Merged
merged 14 commits into from
Jun 22, 2023

Conversation

Ktmi
Copy link

@Ktmi Ktmi commented Jun 13, 2023

This PR adds in an event listener for the new interruptions API to allow kytos-ng/topology to produce the appropriate link up and down events, as well as the topology notify update event. Indirectly, this should resolve the following issues

Additional Notes

While I have added visibility to status and status_reason into the UI for Links and Switches, this feature is missing from Interfaces. Additionally, changes to the status/status_reason are not registered unless the page is reloaded. This applies to changes caused by enabling/disabling a device, or causing an interruption at the device.

@Ktmi Ktmi requested a review from a team as a code owner June 13, 2023 15:08
Copy link
Member

@viniarck viniarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ktmi, overall looks good, it's nice that you also took the opportunity to augment some of the UI components too. I asked a few minor changes though. Also:

1 - Let me confirm the understanding regarding #62, looks like this one here will only be covered when the on_*_maintenance_* methods are nuked, which we will only be able to replace when the rest of the topology.interruption.start|end producers (maintenance in this case) is in place, right? Do you want to create a new task to make this explicit or will you reuse the same one?

2 - Once the comments are addressed, I'd be OK merging this one here since topology.interruption.start|end producer hasn't landed yet, so it should be safe, but before also merging, let me know which local test you minimally explored/ran.

main.py Outdated Show resolved Hide resolved
main.py Outdated Show resolved Hide resolved
ui/k-info-panel/link_info.kytos Show resolved Hide resolved
main.py Outdated Show resolved Hide resolved
main.py Outdated Show resolved Hide resolved
Copy link
Member

@viniarck viniarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main.py Outdated Show resolved Hide resolved
main.py Outdated Show resolved Hide resolved
@viniarck viniarck self-requested a review June 20, 2023 16:53
self.handle_interruption_start(event)

def handle_interruption_start(self, event: KytosEvent):
"""Deals with the start of service interruption."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ktmi considering the point that @italovalcy has also brought to the attention on yesterday's meeting, maybe on both handle_interruption_start and handle_interruption_end it would good points to centralize log.info entries, after all, we're also using log.info for port status changes, so here it'd reasonable too. Another benefit of centralizing here on topology is that it could simplify certain logs for interruption sources. Wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ktmi let me know about this one here, if you agree and also if it'll be part of this PR or a subsequent one.

Copy link
Member

@viniarck viniarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ktmi, I'll leave this PR pre-approved, looks good.

I've also taken the opportunity to explore it locally, I ran these tests:

  • I confirmed that with liveness it wouldn't redundantly notify up again when it gets enabled.
  • I also use the same link-flap script that I used on my previous topology PR, no anomalies observed, the final states were as expected.

Once e2e tests are passing and if your local exploratory-ish tests are passing as well we can merge this. Btw, I raised a new point regarding introducing a log.info to get your thoughts on that, but that can also get delivered in a subsequent PR if we want to add it.

@Ktmi
Copy link
Author

Ktmi commented Jun 21, 2023

I haven't run e2e tests yet as I kept realizing several potential issues with consistency with link activation/deactivation, that I ended up rewriting a large portion of topology, and am starting to creep into other NApps. A few notes on the matter:

  • of_lldp liveness will be rewritten to manage liveness status without using metadata and will instead produce interruptions
  • of_core should probably be in full control of link activation, rather than topology, as it has proper visibility into the space
  • Setting status to != UP for switches and interfaces shouldn't deactivate links, instead links should inherit the status and status reason of its parent objects

@viniarck
Copy link
Member

viniarck commented Jun 21, 2023

I haven't run e2e tests yet as I kept realizing several potential issues with consistency with link activation/deactivation, that I ended up rewriting a large portion of topology, and am starting to creep into other NApps.

@Ktmi, I ran e2e on this PR that's on top of your branch, it's passing. It's great that you're analyzing and putting thoughts into it, but as usual, let's keep moving and addresses tasks based on their priorities and leave for 2023.2 what we can afford to (liveness for instance), and only keep on 2023.1.1 the rest of maintenance tasks that indeed are part of a minimal v1. If you encounter new issues or unplanned issues - or have ideas where things can be improved - please create a task and or suggestion just so it can be aligned and prioritized, by all means, feel free to use GitHub as you need to map these tasks/efforts.

A few notes on the matter:

  • of_lldp liveness will be rewritten to manage liveness status without using metadata and will instead produce interruptions

Right. But, let's leave this one for 2023.2 and also compare with liveness BFD, maybe liveness BFD will still needs to be prioritized first.

  • of_core should probably be in full control of link activation, rather than topology, as it has proper visibility into the space

Topology self.links certainly could be moved to Kytos core since it's a shared core resource (analogous to controller.switches and switch.interfaces dicts), and then of_core could indeed also take the opportunity to activate. But still needs a discussion to see if it's really worth it and when we could afford to. Other than that, race conditions that were impacting have been fixed. Historically, of_core was also designed to be as lightweight as possible, but I agree that activating links there since it's performing certain locks could simplify and make it easier to maintain.

  • Setting status to != UP for switches and interfaces shouldn't deactivate links, instead links should inherit the status and status reason of its parent objects

Indeed active is independent and status is a factor of self._active and self._enabled, and other logical attrs that we're registering. Regarding status of composed objects, yes, ideally, but again, let's plan incrementally here, let's avoid significant rewrites on parts where we can still move incrementally, and then plan accordingly on which parts we want to keep improving in the next planning, OK?

@Ktmi
Copy link
Author

Ktmi commented Jun 21, 2023

I haven't run e2e tests yet as I kept realizing several potential issues with consistency with link activation/deactivation, that I ended up rewriting a large portion of topology, and am starting to creep into other NApps.

@Ktmi, I ran e2e on this PR that's on top of your branch, it's passing. It's great that you're analyzing and putting thoughts into it, but as usual, let's keep moving and addresses tasks based on their priorities and leave for 2023.2 what we can afford to (liveness for instance), and only keep on 2023.1.1 the rest of maintenance tasks that indeed are part of a minimal v1. If you encounter new issues or unplanned issues - or have ideas where things can be improved - please create a task and or suggestion just so it can be aligned and prioritized, by all means, feel free to use GitHub as you need to map these tasks/efforts.

A few notes on the matter:

  • of_lldp liveness will be rewritten to manage liveness status without using metadata and will instead produce interruptions

Right. But, let's leave this one for 2023.2 and also compare with liveness BFD, maybe liveness BFD will still needs to be prioritized first.

  • of_core should probably be in full control of link activation, rather than topology, as it has proper visibility into the space

Topology self.links certainly could be moved to Kytos core since it's a shared core resource (analogous to controller.switches and switch.interfaces dicts), and then of_core could indeed also take the opportunity to activate. But still needs a discussion to see if it's really worth it and when we could afford to. Other than that, race conditions that were impacting have been fixed. Historically, of_core was also designed to be as lightweight as possible, but I agree that activating links there since it's performing certain locks could simplify and make it easier to maintain.

  • Setting status to != UP for switches and interfaces shouldn't deactivate links, instead links should inherit the status and status reason of its parent objects

Indeed active is independent and status is a factor of self._active and self._enabled, and other logical attrs that we're registering. Regarding status of composed objects, yes, ideally, but again, let's plan incrementally here, let's avoid significant rewrites on parts where we can still move incrementally, and then plan accordingly on which parts we want to keep improving in the next planning, OK?

Yeah, I guess we can put those additional changes for a later release, and just push out what we have already. For now, lets get this merged.

@Ktmi
Copy link
Author

Ktmi commented Jun 21, 2023

Getting failing e2e tests with this and kytos-ng/maintenance#78. Here are the test results. It seems that the evcs being properly notified.

I'm an idiot, the issue was on maintenance side. The maintenance events were using an incorrect endpoint.

@viniarck
Copy link
Member

Getting failing e2e tests with this and kytos-ng/maintenance#78. Here are the test results. It seems that the evcs being properly notified.

I'm an idiot, the issue was on maintenance side. The maintenance events were using an incorrect endpoint.

Good morning, let's merge this one then, and then once MW PRs is also ready we can make a decision if it'll be on 2023.1 or 2023.1.1 (with the rest of MW related tasks that you mentioned like preview and so on). Thanks, David.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants