Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: subscribe and handle kytos/mef_eline.(failover_link_down|failover_old_path|failover_deployed) #104

Merged
merged 16 commits into from
Jun 14, 2024

Conversation

viniarck
Copy link
Member

@viniarck viniarck commented Jun 10, 2024

Closes #90
Closes #33
Closes #38
Closes #105

Functionality-wise it's been implemented, it'll keep it in draft while I'm finish the unit tests, and also I need to re-stress test when the INT lab is available again this week.

Summary

See updated changelog file

Local Tests

  • I explored all failover related events with dynamic EVCs, and measured a convergence with a link of the current_path going down on one EVC with iperf3, it has been on par with prior metrics compared to how mef_eline performs (a few thousands packets retransmissions at 9.5 Gbits/sec):
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-60.00  sec  66.0 GBytes  9.44 Gbits/sec  1085             sender
[  4]   0.00-60.00  sec  65.9 GBytes  9.44 Gbits/sec                  receiver
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-60.00  sec  66.0 GBytes  9.45 Gbits/sec  2914             sender
[  4]   0.00-60.00  sec  66.0 GBytes  9.44 Gbits/sec                  receiver

Prior metrics that I shared on Slack

Cool to see telemetry_int in action handling ingress fast failover_path convergence on INT lab for the first time. I've explored two cases with iperf3 -c 10.22.22.3 -i 1 -t 20 -b 10G:

With one EVC:

  • mef_eline TCP packet retransmissions: 1824, 2044, 4017; avg 2628.33
  • telemetry_int TCP packet retransmissions: 1423, 1816, 4192; avg 2477.0

When handling ingress failover_path convergence it's adding roughly up to 25 ms of latency when sending the events internally, and pushing the extra INT flows, which are sent in the socket with asyncio TCP transport with the other mef_eline concurrent lower priority flows.

With 101 INT EVCs:

  • telemetry_int TCP packet retransmissions: 3243

Bottom line so far: The extra INT new flows didn't add much latency in the total convergence, on average, it's relatively on par with mef_eline, from the data plane network traffic point of view, and the switch has been processing them all relatively quickly too.

Data plane traffic hiccup during the failover for both mef_eline and telemetry_int:

[  4]   4.00-5.00   sec  1.11 GBytes  9.56 Gbits/sec    0   2.82 MBytes       
[  4]   5.00-6.00   sec   411 MBytes  3.45 Gbits/sec  2044   1.41 MBytes       
[  4]   6.00-7.00   sec  1.11 GBytes  9.56 Gbits/sec    0   1.41 MBytes       

[  4]   5.00-6.00   sec  1.11 GBytes  9.56 Gbits/sec    0   2.51 MBytes       
[  4]   6.00-7.00   sec  1.01 GBytes  8.71 Gbits/sec    0   2.51 MBytes       
[  4]   7.00-8.00   sec   509 MBytes  4.27 Gbits/sec  1816   1.26 MBytes       
[  4]   8.00-9.00   sec  1.11 GBytes  9.56 Gbits/sec    0   1.26 MBytes       

Tox is passing locally but failing on Scrutinizer CI (I believe it's a temporary upstream issue, let's see):

---------- coverage: platform linux, python 3.11.9-final-0 -----------
Name                                        Stmts   Miss  Cover
---------------------------------------------------------------
__init__.py                                     0      0   100%
exceptions.py                                  31      2    94%
kytos_api_helper.py                            76     15    80%
main.py                                       272     88    68%
managers/__init__.py                            0      0   100%
managers/flow_builder.py                      159      2    99%
managers/int.py                               353     61    83%
proxy_port.py                                  24      3    88%
settings.py                                    12      0   100%
tests/conftest.py                              18      0   100%
tests/unit/test_flow_builder_failover.py      152      0   100%
tests/unit/test_flow_builder_inter_evc.py      60      0   100%
tests/unit/test_flow_builder_intra_evc.py     152      0   100%
tests/unit/test_int_manager.py                366      0   100%
tests/unit/test_kytos_api_helper.py            63      0   100%
tests/unit/test_main.py                       253      0   100%
tests/unit/test_utils.py                       79      0   100%
utils.py                                       67      0   100%
---------------------------------------------------------------
TOTAL                                        2137    171    92%

============================================================================ 84 passed, 114 warnings in 7.34s ============================================================================
lint: recreate env because env type changed from {'name': 'coverage', 'type': 'VirtualEnvRunner'} to {'name': 'lint', 'type': 'VirtualEnvRunner'}
lint: remove tox env folder /home/viniarck/repos/telemetry_int/.tox/py311
coverage: OK ✔ in 46.29 seconds
lint: install_deps> python -I -m pip install -r requirements/dev.in
lint: commands[0]> python3 setup.py lint
running lint
Yala is running. It may take several seconds...
INFO: Finished isort
INFO: Finished black
INFO: Finished pycodestyle
INFO: Finished pylint
:) No issues found.
[isort] Skipped 3 files
  coverage: OK (46.29=setup[38.50]+cmd[7.79] seconds)
  lint: OK (43.95=setup[38.12]+cmd[5.83] seconds)
  congratulations :) (90.27 seconds)

End-to-End Tests

N/A yet

@viniarck viniarck requested a review from a team as a code owner June 10, 2024 13:52
@viniarck viniarck marked this pull request as draft June 10, 2024 13:52
@viniarck viniarck marked this pull request as ready for review June 12, 2024 18:39
@viniarck viniarck merged commit 81fe6d6 into master Jun 14, 2024
1 check passed
@viniarck viniarck deleted the feat/failover_link_down branch June 14, 2024 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment