Paced Flow Manager #187

Ktmi · 2024-07-01T21:04:48Z

Part of resolving kytos-ng/of_multi_table#27, kytos-ng/mef_eline#372, and kytos-ng/telemetry_int#47.

Summary

This adds in a pacer to flow manager, pacing flow mods to switches. This does not fully resolve replacing batching, as it removes the control that NApps have over the process. Possible ways fully resolve this are to implement pacing on the producer end, or to use the owner tag on stored flows to get the pacer for the NApp that requested the flow_mod to pace using that NApp's pacer.

Additionally, this allows setting pace on flow mods on a per owner basis. This features is dependent on this PR for kytos.

Local Tests

Tested creating and removing flows. Works as expected. Paces apply appropriately on per owner basis.

End-to-End Tests

Will need to rerun E2E tests

viniarck

Great. It ended up great to have all the pacers on flow_manager pacing per owner per dpid, and with the default no_owner fallback. Nicely done.

I requested a few changes though. Also, changelog hasn't been updated, this will land once the PRs replacing the pacers are ready to land too.

For the record, can you post on the local test wireshark FlowMod pps chart, just for mef_eline is good enough to me like the current fixed_window with 100/second and another with 50/second? It'd be great to have that documented here too, just so in the future it can also serve as a reference for whoever is upgrading the library or whatever they might be looking for.

main.py

settings.py

main.py

Ktmi · 2024-07-09T21:41:37Z

@viniarck I think I noticed an odd quirk of the code, regarding _install_flows. If _install_flows is used to send flows to mutliple switches, it will send duplicate flows. This is due to the flow_mods intended for specific switches, being sent to every switch in the _send_flow_mods phase of the process. What would be preferable, is if the switch was provided in a tuple or dataclass alongside the flow_mod.

viniarck · 2024-07-10T15:15:32Z

@viniarck I think I noticed an odd quirk of the code, regarding _install_flows. If _install_flows is used to send flows to mutliple switches, it will send duplicate flows. This is due to the flow_mods intended for specific switches, being sent to every switch in the _send_flow_mods phase of the process. What would be preferable, is if the switch was provided in a tuple or dataclass alongside the flow_mod.

@Ktmi,

If _install_flows is used to send flows to mutliple switches, it will send duplicate flows.

Yes, that's the intended behavior. It's for POST v2/flows that sends to all enabled switches.

What would be preferable, is if the switch was provided in a tuple or dataclass alongside the flow_mod.

All things considered how the code has been structured historically, I'd say let's only pass an extra list of owners and call it a day, and include it in the tuple. Let's leave a potential refactor a dataclass in the future, we'd need to linearize the nested for loop as you said and then lots of unit tests to refactor too. It's probably not worth the effort considering also the other major tasks/epics that we've got on our radar.

…_int

viniarck

@Ktmi, functionality-wise it's all set from what I reviewed. Very neat how this has been integrated per dpid differentiating per flow owner. What's still remaining before I approve:

Changelog needs to be updated
Can you post wireshark IO flow mods per dpid documenting on this PR that "fixed_window" is working as expected? I'd like to see that once and then it can serve as a history record if one day upstream changes we can look back at the chart and compare. I'd like to see a producer going over the pacer rate and then including sending to different dpids, and it should be expected that no connections should be dropped due to OF keepalive getting delayed on either side.

settings.py

Ktmi · 2024-07-22T16:35:26Z

Here is a demonstration of the limiting flow mods on mef_eline. I paced the action to 10 per second, while trying to deploy a few hundred EVCs. Be aware, this doesn't exclude flow mods from other NApps.

viniarck · 2024-07-22T16:42:02Z

Here is a demonstration of the limiting flow mods on mef_eline. I paced the action to 10 per second, while trying to deploy a few hundred EVCs. Be aware, this doesn't exclude flow mods from other NApps.

This is good, now go over the default configured rates (100/seceond) stressing over multiple dpids. The time to stress test this is now. I wanna see if it's stable with that value too. Probably no surprises, but let's confirm in practice. If that's solid too, this is good to get merged.

viniarck · 2024-07-22T16:44:10Z

^ and then sustain for a few mins, if no connections dropped, that's great.

Ktmi · 2024-07-24T21:19:38Z

So, here are four cases of deploying flows in flow_manager with a pace of 100 per second per switch. The blue line represents flow mods sent to a single switch, the green the flow mods sent to all switches. Each test is separated by a 10 second interval.

The first test is deploying 1000 flows in a single request to a single switch. This results in sending about 100 flow mods per second, as expected

The second test is deploying 1 flow at a time 1000 times to a single switch. The results of this are fairly similar to the results of the first test, with about 100 flow mods per second.

The third test is deploy 1000 flows at a time to 9 different switches simultaneously. The process of sending the flows took about 20 seconds, which means that we are sending flow mods at about 50 per second in this test. Very strange, not sure why that happends. Additionally, this test took about 20 seconds to get started, in spite of the inverval between tests supposed to be only 10 seconds. This indicates that a lot of time is being spent on the process of preparing the flow mods to be sent.

The fourth test is deploying 1 flow at a time 1000 times to 9 different switches. This was done with a few worker threads assigned to each switch to ensure that requests were fairly distributed, and that the pending requests for any given switch wouldn't block sending requests to others. The flow mod rate just plummets with this setup, with the flow mod rate to all switches hovering around 100 per second. This might just be a limit of how fast messages can be processed by the API.

viniarck · 2024-07-25T13:17:25Z

@kmti, good insights. Let's try to further investigate case 3 and 4 to better understand.

A few things that comes to my mind to quickly brainstorm with you, in case that helps:

Try using the apm too and keep an eye on the highest impact handlers that pop up
Did you notice if the concurrency_limit ended up being reached in the API resulting in 503? Especially on the test case 4
It'd be cool to also compare the same tests commenting out the line hitting the pacer. Does it achieve the expected rates?
You can potentially explore by sending flows to flow_manager via events from another NApp, with kytos.flow_manager.flows.install (runs on app thread pool) or kytos.flow_manager.flows.single.install (dynamic single thread) -> this one would only perform well if many flows are sent in the event. But, based on what you described looks like only the fourth test had too many requests.

Ktmi · 2024-07-26T22:39:12Z

So I compared ran the flow deployment tests again with the pacing code commented out, and here's what I got for results:

It's incredibly similar to what we got from doing the tests with pacing.

I also redid the tests with the APM. I instrumented a good portion of the surrounding code.

What I found, is that when deploying a single flow, most of the time is spent writing to the DB. This is fine when writing many flows in a single request, but when writing many flows, across many requests, the amount of time required for this process builds up. Additionally, due to the GIL, even having 1000 threads in the API threadpool doesn't help improve performance here. They are all competing for the GIL so they just end up slowing eachother down.

viniarck

@Ktmi, good analysis in the last comment, and great to see the rest of the intrumentation too including for pyof pack. Good to know that pacing behaved as expected. I'll leave the PR approved, before merging though please address the points below:

1 - Add a pacer for coloring as a owner too
2 - Can you map and take care a new task to document in a single request (or event) to see the highest number of flow mods per sec it can send for a single dpid without without resulting in any issues?
3 - Solve changelog git conflict

The pacer will ensure that it won't get unstable. But, we'll need this information here to have an idea of the ceiling value that we can try it out, to fine tune telemetry_int kytos-ng/telemetry_int#40. Sure there are other variables involved here, but with mininet and OvS locally with minimal delay in the control plane is fair game too. Having that information will be a pre-requisite for fine tune telemetry_int and then we'll know to never go too much overboard there.

settings.py

Ktmi added 2 commits April 18, 2024 12:53

Add pacing to send_flow_mod

792505e

Switched to local pace settings

3a0e0c5

Ktmi requested a review from a team as a code owner July 1, 2024 21:04

Ktmi changed the title ~~Paced/flow manager~~ Paced Flow Manager Jul 1, 2024

Ktmi added 2 commits July 1, 2024 21:36

linter cleanup

928de39

Added per owner pacing

3cb493a

viniarck requested changes Jul 8, 2024

View reviewed changes

main.py Outdated Show resolved Hide resolved

main.py Show resolved Hide resolved

settings.py Show resolved Hide resolved

settings.py Outdated Show resolved Hide resolved

settings.py Show resolved Hide resolved

main.py Outdated Show resolved Hide resolved

Ktmi added 4 commits July 10, 2024 18:47

Remove commands list from _install_flows

d226df0

Simplified ownership passing pipeline

7adff50

Updated settings to include defaults for of_multi_table and telemetry…

565d8c1

…_int

Update tests to add in ownership information.

6a72a14

viniarck self-requested a review July 16, 2024 15:11

viniarck reviewed Jul 16, 2024

View reviewed changes

settings.py Show resolved Hide resolved

settings.py Show resolved Hide resolved

Ktmi added 2 commits July 22, 2024 12:39

Added code to handle uncofigured owner for pacing

86c307b

Added default pace for of_lldp flowmods

fe0faf8

Updated changelog

d7481a6

viniarck self-requested a review July 29, 2024 14:31

viniarck approved these changes Jul 29, 2024

View reviewed changes

settings.py Show resolved Hide resolved

Ktmi and others added 2 commits July 29, 2024 11:19

Add pacing for coloring to settings

5d4b8f5

Merge branch 'master' into paced/flow_manager

a44a8d3

Ktmi mentioned this pull request Jul 30, 2024

Find Peak Flow Mods per Second in various scenarios. #193

Open

Ktmi merged commit 2876238 into master Jul 31, 2024
1 check passed

Ktmi deleted the paced/flow_manager branch July 31, 2024 19:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paced Flow Manager #187

Paced Flow Manager #187

Ktmi commented Jul 1, 2024 •

edited

Loading

viniarck left a comment

Ktmi commented Jul 9, 2024

viniarck commented Jul 10, 2024 •

edited

Loading

viniarck left a comment

Ktmi commented Jul 22, 2024

viniarck commented Jul 22, 2024

viniarck commented Jul 22, 2024

Ktmi commented Jul 24, 2024

viniarck commented Jul 25, 2024 •

edited

Loading

Ktmi commented Jul 26, 2024

viniarck left a comment •

edited

Loading

Paced Flow Manager #187

Paced Flow Manager #187

Conversation

Ktmi commented Jul 1, 2024 • edited Loading

Summary

Local Tests

End-to-End Tests

viniarck left a comment

Choose a reason for hiding this comment

Ktmi commented Jul 9, 2024

viniarck commented Jul 10, 2024 • edited Loading

viniarck left a comment

Choose a reason for hiding this comment

Ktmi commented Jul 22, 2024

viniarck commented Jul 22, 2024

viniarck commented Jul 22, 2024

Ktmi commented Jul 24, 2024

viniarck commented Jul 25, 2024 • edited Loading

Ktmi commented Jul 26, 2024

viniarck left a comment • edited Loading

Choose a reason for hiding this comment

Ktmi commented Jul 1, 2024 •

edited

Loading

viniarck commented Jul 10, 2024 •

edited

Loading

viniarck commented Jul 25, 2024 •

edited

Loading

viniarck left a comment •

edited

Loading