Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Added archived flows #45

Closed
wants to merge 24 commits into from
Closed

Conversation

viniarck
Copy link
Member

@viniarck viniarck commented Nov 10, 2021

Fixes #33

Description of the change

  • This PR added support for archiving flows whenever they are overlapped or deleted.
  • This PR is on top of [Fix] Overlapping stored flows  #44, but the diffs won't be accumulated here since I used the base in this PR as fix/overlapping_flow

Added

Changed

  • Updated stored_flows box to stop using a random generated box id, now it's named flows. This is a breaking changing (we can provide a migration script if someone needs), but facilitates moving forward to also find for it, since it was assuming it only had one box in this namespace, which isn't the case anymore since we have to introduce a box for archived flows.

@@ -11,3 +11,8 @@
# To filter by a cookie or `table_id` range [(value1, value2)]
CONSISTENCY_COOKIE_IGNORED_RANGE = []
CONSISTENCY_TABLE_ID_IGNORED_RANGE = []

# Maximum number of flows to archive per switch
ARCHIVED_MAX_FLOWS_PER_SWITCH = 5000
Copy link
Member Author

@viniarck viniarck Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I set a value higher than a couple of thousands, but we might need more info from expected volume that we want to keep track of archived flows, let me know if we need to support a higher value here, cc'ing @italovalcy @jab1982 @ajoaoff @rmotitsuki.

Copy link
Member Author

@viniarck viniarck Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: We need to go with a reasonable size for this list that's good enough for production, and not allow it to grow too much, especially since we're still using the default FS backend with storehouse, having large objects on top of the high number of lock ops is something to minimize to ensure smooth IO operations. In the future, as we can afford to move to a more powerful and production grade DB, I think most of these concerns will be gone. With FS storage using pickle, I tested with 9000+ archived flows on OVS, and it behaved as expected:

❯ curl http://127.0.0.1:8181/api/kytos/storehouse/v1/kytos.flow.persistence/flows_archived | jq '.["00:00:00:00:00:00:00:01"]' | jq length
9020

The size of the binary pickled, didn't end up too large, but pickle with the default serialization isn't going to be production grade as other DB storages that have been optimized and designed for that:

❯ lar
total 704K
drwxr-xr-x 8 viniarck viniarck 4.0K Nov 10 15:31 ..
drwxr-xr-x 2 viniarck viniarck 4.0K Nov 10 15:59 .
-rw-r--r-- 1 viniarck viniarck 1.4K Nov 10 16:13 flows
-rw-r--r-- 1 viniarck viniarck 691K Nov 10 16:13 flows_archived

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@italovalcy @ajoaoff @jab1982 @rmotitsuki here's more data for us to make an informed decision. I've measured more execution times using the existing FS backend for how long it was taking to load the flows, and also for writing:

  • In this first case, with a linear topology with 3 OVS switches and just a single archived flow, pretty much there isn't significant difference between the load and write times for flows and flows_archived:
load_flows 'flows' exec in secs load_flows 'flows_archived' exec in secs flow saved in kytos.flow.persistence.flows in secs flow saved in kytos.flow.persistence.flows_archived in secs number of archived flows per switch number of switches
0.321988059 0.323313328 0.056381483 0.063623033 1 3
0.331015045 0.336439969 0.060050609 0.064152285 1 3
0.222171105 0.225473629 0.055098731 0.05798877 1 3
0.317800926 0.317802884 0.041970486 0.048492011 1 3
0.210971846 0.204924712 0.038846376 0.052048731 1 3
0.236901137 0.230581803 0.156860926 0.147859926 1 3
0.32118056 0.325453682 0.060899418 0.074994289 1 3
           
average in secs: average in secs: average in secs: average in secs:    
0.2802898111 0.280570001 0.06715828986 0.07273700643    
           
std deviation in secs: std deviation in secs: std deviation in secs: std deviation in secs:    
0.05393865596 0.05716338808 0.04048419031 0.03425065777    
           
variance in secs: variance in secs: variance in secs: variance in secs:    
0.002909378607 0.003267652936 0.001638969665 0.001173107557    
  • but, when we scale with a topology with 30 switches and 9100 archived flows per switch, notice that the overall load times increased by 4 times (1.242984775 / 0.3031526574), and IO writes are 22 times (3.514650555 / 0.1539360896) slower on average, taking up more than seconds, all of this happen asynchronously on storage house, but it's always saving the file entirely, overall it's functional, but it's a bit concerning to have these relatively slow writes, although the load times weren't too bad considering the existing load times:
load_flows 'flows' exec in secs load_flows 'flows_archived' exec in secs flow saved in kytos.flow.persistence.flows in secs flow saved in kytos.flow.persistence.flows_archived in secs number of archived flows per switch number of switches
0.343262868 1.36081191 0.380696204 2.396087289 9100 30
0.309977511 1.226346058 0.043746247 2.461283397 9100 30
0.306658912 1.243730335 0.060340325 2.230030912 9100 30
0.336292488 1.229078531 0.311950604 4.024685604 9100 30
0.306667977 1.217683696 0.097841618 4.552577523 9100 30
0.208896111 1.205937712 0.098094499 5.464421286 9100 30
0.310312735 1.217305181 0.08488313 3.473467872 9100 30
           
average in secs: average in secs: average in secs: average in secs:    
0.3031526574 1.242984775 0.1539360896 3.514650555    
           
std deviation in secs: std deviation in secs: std deviation in secs: std deviation in secs:    
0.04418497206 0.05326989636 0.1343651052 1.235026574    
           
variance in secs: variance in secs: variance in secs: variance in secs:    
0.001952311756 0.002837681858 0.0180539815 1.525290639    

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness sake, I've also run more stress test targeting the flows box storage as well, since the previous one was only stressing the flows_archived, as you can see, with 30+ switches, and 9000+ flows per switch, we also have writes that can take 8+ seconds on average, this shows that the current insertion on FS backend performs poorly, DB with optimized insertion is expected to perform better and also depending how we partition the collections let's say per dpid we also could gain some perf:

load_flows 'flows' exec in secs load_flows 'flows_archived' exec in secs flow saved in kytos.flow.persistence.flows in secs flow saved in kytos.flow.persistence.flows_archived in secs number of flows per switch number of switches
1.320028502 0.413320756 10.63344993 0.05605842 9100 30
1.420192211 0.318927119 9.613857726 0.04748472 9100 30
1.112718186 0.401718191 6.870411208 0.06791718 9100 30
1.335173917 0.412811008 7.292584671 0.07829201 9100 30
1.408119001 0.320302018 7.731722126 0.08179112 9100 30
           
average in secs: average in secs: average in secs: average in secs:    
1.319246363 0.3734158184 8.428405133 0.06630869    
           
std deviation in secs: std deviation in secs: std deviation in secs: std deviation in secs:    
0.1234874453 0.04933432745 1.617893756 0.01453704508    
           
variance in secs: variance in secs: variance in secs: variance in secs:    
0.01524914916 0.002433875865 2.617580205 0.0002113256797    

I think this would a great starting point to compare if without changing much the data models if on NoSQL it would be able to handle this expected production worst case scenario, and it might be worth extrapolating the number of flows a bit more since we also want to make sure the potential NoSQL storage we end up with have some room to still perform well and meet our perf requirements, otherwise we need to reassess other options.

@@ -11,3 +11,8 @@
# To filter by a cookie or `table_id` range [(value1, value2)]
CONSISTENCY_COOKIE_IGNORED_RANGE = []
CONSISTENCY_TABLE_ID_IGNORED_RANGE = []

# Maximum number of flows to archive per switch
ARCHIVED_MAX_FLOWS_PER_SWITCH = 5000
Copy link
Member Author

@viniarck viniarck Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: We need to go with a reasonable size for this list that's good enough for production, and not allow it to grow too much, especially since we're still using the default FS backend with storehouse, having large objects on top of the high number of lock ops is something to minimize to ensure smooth IO operations. In the future, as we can afford to move to a more powerful and production grade DB, I think most of these concerns will be gone. With FS storage using pickle, I tested with 9000+ archived flows on OVS, and it behaved as expected:

❯ curl http://127.0.0.1:8181/api/kytos/storehouse/v1/kytos.flow.persistence/flows_archived | jq '.["00:00:00:00:00:00:00:01"]' | jq length
9020

The size of the binary pickled, didn't end up too large, but pickle with the default serialization isn't going to be production grade as other DB storages that have been optimized and designed for that:

❯ lar
total 704K
drwxr-xr-x 8 viniarck viniarck 4.0K Nov 10 15:31 ..
drwxr-xr-x 2 viniarck viniarck 4.0K Nov 10 15:59 .
-rw-r--r-- 1 viniarck viniarck 1.4K Nov 10 16:13 flows
-rw-r--r-- 1 viniarck viniarck 691K Nov 10 16:13 flows_archived

@@ -237,6 +262,13 @@ def check_storehouse_consistency(self, switch):
f"Consistency check: alien flow on switch {dpid}, dpid"
" not indexed"
)
archived_flows.append(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still a risk of losing this append here if _install_flows side effect below crashes, fixing #26 that's on our radar would mitigate this risk.

@viniarck
Copy link
Member Author

I'm converting this as a draft for now as the discussion on #45 (comment) evolves.

@viniarck
Copy link
Member Author

I'm closing this PR, it has become obsolete now that flow_manager is migrating to MongoDB, when the time comes to prioritize this again it'll be a different implementation.

@viniarck viniarck closed this May 26, 2022
@viniarck viniarck deleted the feature/archived_flows branch July 29, 2022 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consistency check should archive alien deleted and modified flows
1 participant