-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Added archived flows #45
Conversation
Set box to have a deterministic id
Updated _load_flows to load from any box of flows
Added _add_archived_flows_store
Bootstraped box_archived on storehouse client
Parametrized get_data with attr
Updated deleted_flow
Hooked overlapped flows to save on archive
Fixed archived_flows overflow check
Refactored calling sites
@@ -11,3 +11,8 @@ | |||
# To filter by a cookie or `table_id` range [(value1, value2)] | |||
CONSISTENCY_COOKIE_IGNORED_RANGE = [] | |||
CONSISTENCY_TABLE_ID_IGNORED_RANGE = [] | |||
|
|||
# Maximum number of flows to archive per switch | |||
ARCHIVED_MAX_FLOWS_PER_SWITCH = 5000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, I set a value higher than a couple of thousands, but we might need more info from expected volume that we want to keep track of archived flows, let me know if we need to support a higher value here, cc'ing @italovalcy @jab1982 @ajoaoff @rmotitsuki.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: We need to go with a reasonable size for this list that's good enough for production, and not allow it to grow too much, especially since we're still using the default FS backend with storehouse, having large objects on top of the high number of lock ops is something to minimize to ensure smooth IO operations. In the future, as we can afford to move to a more powerful and production grade DB, I think most of these concerns will be gone. With FS
storage using pickle
, I tested with 9000+ archived flows on OVS, and it behaved as expected:
❯ curl http://127.0.0.1:8181/api/kytos/storehouse/v1/kytos.flow.persistence/flows_archived | jq '.["00:00:00:00:00:00:00:01"]' | jq length
9020
The size of the binary pickled, didn't end up too large, but pickle with the default serialization isn't going to be production grade as other DB storages that have been optimized and designed for that:
❯ lar
total 704K
drwxr-xr-x 8 viniarck viniarck 4.0K Nov 10 15:31 ..
drwxr-xr-x 2 viniarck viniarck 4.0K Nov 10 15:59 .
-rw-r--r-- 1 viniarck viniarck 1.4K Nov 10 16:13 flows
-rw-r--r-- 1 viniarck viniarck 691K Nov 10 16:13 flows_archived
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@italovalcy @ajoaoff @jab1982 @rmotitsuki here's more data for us to make an informed decision. I've measured more execution times using the existing FS backend for how long it was taking to load the flows, and also for writing:
- In this first case, with a linear topology with 3 OVS switches and just a single archived flow, pretty much there isn't significant difference between the load and write times for
flows
andflows_archived
:
load_flows 'flows' exec in secs | load_flows 'flows_archived' exec in secs | flow saved in kytos.flow.persistence.flows in secs | flow saved in kytos.flow.persistence.flows_archived in secs | number of archived flows per switch | number of switches |
---|---|---|---|---|---|
0.321988059 | 0.323313328 | 0.056381483 | 0.063623033 | 1 | 3 |
0.331015045 | 0.336439969 | 0.060050609 | 0.064152285 | 1 | 3 |
0.222171105 | 0.225473629 | 0.055098731 | 0.05798877 | 1 | 3 |
0.317800926 | 0.317802884 | 0.041970486 | 0.048492011 | 1 | 3 |
0.210971846 | 0.204924712 | 0.038846376 | 0.052048731 | 1 | 3 |
0.236901137 | 0.230581803 | 0.156860926 | 0.147859926 | 1 | 3 |
0.32118056 | 0.325453682 | 0.060899418 | 0.074994289 | 1 | 3 |
average in secs: | average in secs: | average in secs: | average in secs: | ||
0.2802898111 | 0.280570001 | 0.06715828986 | 0.07273700643 | ||
std deviation in secs: | std deviation in secs: | std deviation in secs: | std deviation in secs: | ||
0.05393865596 | 0.05716338808 | 0.04048419031 | 0.03425065777 | ||
variance in secs: | variance in secs: | variance in secs: | variance in secs: | ||
0.002909378607 | 0.003267652936 | 0.001638969665 | 0.001173107557 |
- but, when we scale with a topology with 30 switches and 9100 archived flows per switch, notice that the overall load times increased by 4 times (
1.242984775 / 0.3031526574
), and IO writes are 22 times (3.514650555 / 0.1539360896
) slower on average, taking up more than seconds, all of this happen asynchronously on storage house, but it's always saving the file entirely, overall it's functional, but it's a bit concerning to have these relatively slow writes, although the load times weren't too bad considering the existing load times:
load_flows 'flows' exec in secs | load_flows 'flows_archived' exec in secs | flow saved in kytos.flow.persistence.flows in secs | flow saved in kytos.flow.persistence.flows_archived in secs | number of archived flows per switch | number of switches |
---|---|---|---|---|---|
0.343262868 | 1.36081191 | 0.380696204 | 2.396087289 | 9100 | 30 |
0.309977511 | 1.226346058 | 0.043746247 | 2.461283397 | 9100 | 30 |
0.306658912 | 1.243730335 | 0.060340325 | 2.230030912 | 9100 | 30 |
0.336292488 | 1.229078531 | 0.311950604 | 4.024685604 | 9100 | 30 |
0.306667977 | 1.217683696 | 0.097841618 | 4.552577523 | 9100 | 30 |
0.208896111 | 1.205937712 | 0.098094499 | 5.464421286 | 9100 | 30 |
0.310312735 | 1.217305181 | 0.08488313 | 3.473467872 | 9100 | 30 |
average in secs: | average in secs: | average in secs: | average in secs: | ||
0.3031526574 | 1.242984775 | 0.1539360896 | 3.514650555 | ||
std deviation in secs: | std deviation in secs: | std deviation in secs: | std deviation in secs: | ||
0.04418497206 | 0.05326989636 | 0.1343651052 | 1.235026574 | ||
variance in secs: | variance in secs: | variance in secs: | variance in secs: | ||
0.001952311756 | 0.002837681858 | 0.0180539815 | 1.525290639 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness sake, I've also run more stress test targeting the flows
box storage as well, since the previous one was only stressing the flows_archived
, as you can see, with 30+ switches, and 9000+ flows per switch, we also have writes that can take 8+ seconds on average, this shows that the current insertion on FS backend performs poorly, DB with optimized insertion is expected to perform better and also depending how we partition the collections let's say per dpid we also could gain some perf:
load_flows 'flows' exec in secs | load_flows 'flows_archived' exec in secs | flow saved in kytos.flow.persistence.flows in secs | flow saved in kytos.flow.persistence.flows_archived in secs | number of flows per switch | number of switches |
---|---|---|---|---|---|
1.320028502 | 0.413320756 | 10.63344993 | 0.05605842 | 9100 | 30 |
1.420192211 | 0.318927119 | 9.613857726 | 0.04748472 | 9100 | 30 |
1.112718186 | 0.401718191 | 6.870411208 | 0.06791718 | 9100 | 30 |
1.335173917 | 0.412811008 | 7.292584671 | 0.07829201 | 9100 | 30 |
1.408119001 | 0.320302018 | 7.731722126 | 0.08179112 | 9100 | 30 |
average in secs: | average in secs: | average in secs: | average in secs: | ||
1.319246363 | 0.3734158184 | 8.428405133 | 0.06630869 | ||
std deviation in secs: | std deviation in secs: | std deviation in secs: | std deviation in secs: | ||
0.1234874453 | 0.04933432745 | 1.617893756 | 0.01453704508 | ||
variance in secs: | variance in secs: | variance in secs: | variance in secs: | ||
0.01524914916 | 0.002433875865 | 2.617580205 | 0.0002113256797 |
I think this would a great starting point to compare if without changing much the data models if on NoSQL it would be able to handle this expected production worst case scenario, and it might be worth extrapolating the number of flows a bit more since we also want to make sure the potential NoSQL storage we end up with have some room to still perform well and meet our perf requirements, otherwise we need to reassess other options.
@@ -11,3 +11,8 @@ | |||
# To filter by a cookie or `table_id` range [(value1, value2)] | |||
CONSISTENCY_COOKIE_IGNORED_RANGE = [] | |||
CONSISTENCY_TABLE_ID_IGNORED_RANGE = [] | |||
|
|||
# Maximum number of flows to archive per switch | |||
ARCHIVED_MAX_FLOWS_PER_SWITCH = 5000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: We need to go with a reasonable size for this list that's good enough for production, and not allow it to grow too much, especially since we're still using the default FS backend with storehouse, having large objects on top of the high number of lock ops is something to minimize to ensure smooth IO operations. In the future, as we can afford to move to a more powerful and production grade DB, I think most of these concerns will be gone. With FS
storage using pickle
, I tested with 9000+ archived flows on OVS, and it behaved as expected:
❯ curl http://127.0.0.1:8181/api/kytos/storehouse/v1/kytos.flow.persistence/flows_archived | jq '.["00:00:00:00:00:00:00:01"]' | jq length
9020
The size of the binary pickled, didn't end up too large, but pickle with the default serialization isn't going to be production grade as other DB storages that have been optimized and designed for that:
❯ lar
total 704K
drwxr-xr-x 8 viniarck viniarck 4.0K Nov 10 15:31 ..
drwxr-xr-x 2 viniarck viniarck 4.0K Nov 10 15:59 .
-rw-r--r-- 1 viniarck viniarck 1.4K Nov 10 16:13 flows
-rw-r--r-- 1 viniarck viniarck 691K Nov 10 16:13 flows_archived
@@ -237,6 +262,13 @@ def check_storehouse_consistency(self, switch): | |||
f"Consistency check: alien flow on switch {dpid}, dpid" | |||
" not indexed" | |||
) | |||
archived_flows.append( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still a risk of losing this append
here if _install_flows
side effect below crashes, fixing #26 that's on our radar would mitigate this risk.
I'm converting this as a draft for now as the discussion on #45 (comment) evolves. |
I'm closing this PR, it has become obsolete now that flow_manager is migrating to MongoDB, when the time comes to prioritize this again it'll be a different implementation. |
Fixes #33
Description of the change
base
in this PR asfix/overlapping_flow
Added
flows_archived
Changed
stored_flows
box to stop using a random generated box id, now it's namedflows
. This is a breaking changing (we can provide a migration script if someone needs), but facilitates moving forward to also find for it, since it was assuming it only had one box in this namespace, which isn't the case anymore since we have to introduce a box for archived flows.