[DPE-4532] Increase timeout and terminate processes that are still up #514

dragomirp · 2024-06-30T18:46:09Z

Try to stabilise full cluster restart tests

codecov · 2024-06-30T18:47:01Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.68%. Comparing base (ac88aca) to head (bfc33a6).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #514      +/-   ##
==========================================
+ Coverage   68.66%   68.68%   +0.02%     
==========================================
  Files          11       11              
  Lines        3003     3015      +12     
  Branches      532      535       +3     
==========================================
+ Hits         2062     2071       +9     
- Misses        822      823       +1     
- Partials      119      121       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dragomirp · 2024-07-09T11:46:32Z

tests/integration/ha_tests/test_self_healing.py

Skip remaining tests if one fails. There are a lot of timeouts in the continuous writes fixture, so if one of the tests breaks the db, we end up waiting for a while.

dragomirp · 2024-07-09T11:49:39Z

tests/integration/ha_tests/helpers.py

@@ -68,7 +68,7 @@ async def are_all_db_processes_down(ops_test: OpsTest, process: str) -> bool:
        pgrep_cmd = ("pgrep", "-x", process)

    try:
-        for attempt in Retrying(stop=stop_after_delay(60), wait=wait_fixed(3)):
+        for attempt in Retrying(stop=stop_after_delay(400), wait=wait_fixed(3)):


We set Patroni's loop_wait to 300

dragomirp · 2024-07-09T11:52:35Z

tests/integration/ha_tests/helpers.py

+                        logger.info("Unit %s not yet down" % unit.name)
+                        # Try to rekill the unit
+                        await send_signal_to_process(ops_test, unit.name, process, signal)


Observing the log, there's usually at least one unit stuck. I guess that it manages to escape systemd's restart condition/Patroni's loop_wait and gets revived, so killing it again to make sure.

tests/integration/ha_tests/test_self_healing.py

marceloneppel

LGTM! Thanks a lot!

lucasgameiroborges

LGTM!

Increase timeou and log unit that is still up

bd4b7de

github-actions bot added the Libraries: OK label Jun 30, 2024

dragomirp added 3 commits July 1, 2024 00:27

Early fail

32f07a4

Merge branch 'main' into dpe-4532-flaky-tests

9eb5836

Bump coverage

87b2415

github-actions bot added Libraries: Out of sync and removed Libraries: OK labels Jul 8, 2024

dragomirp added 3 commits July 8, 2024 19:12

Restore pyproj

8341902

Bump coverage

c1d41f8

Bump libs

cc8b152

github-actions bot added Libraries: OK and removed Libraries: Out of sync labels Jul 8, 2024

dragomirp added 4 commits July 8, 2024 20:19

Bump coverage

132146b

Merge branch 'main' into dpe-4532-flaky-tests

38d9a93

Revert cluster test

c7c085d

Try to rekill process

e759af6

dragomirp commented Jul 9, 2024

View reviewed changes

dragomirp marked this pull request as ready for review July 9, 2024 12:19

dragomirp requested review from marceloneppel and lucasgameiroborges July 9, 2024 12:19

dragomirp changed the title ~~[DPE-4532] Increase timeout and log unit that is still up~~ [DPE-4532] Increase timeout and terminate processes that are still up Jul 9, 2024

marceloneppel reviewed Jul 9, 2024

View reviewed changes

tests/integration/ha_tests/test_self_healing.py Outdated Show resolved Hide resolved

Revert removed assert

bfc33a6

dragomirp requested a review from marceloneppel July 10, 2024 09:50

marceloneppel approved these changes Jul 10, 2024

View reviewed changes

lucasgameiroborges approved these changes Jul 10, 2024

View reviewed changes

dragomirp merged commit 7299748 into main Jul 10, 2024
78 of 79 checks passed

dragomirp deleted the dpe-4532-flaky-tests branch July 10, 2024 12:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DPE-4532] Increase timeout and terminate processes that are still up #514

[DPE-4532] Increase timeout and terminate processes that are still up #514

dragomirp commented Jun 30, 2024 •

edited

Loading

codecov bot commented Jun 30, 2024 •

edited

Loading

dragomirp Jul 9, 2024

dragomirp Jul 9, 2024

dragomirp Jul 9, 2024

marceloneppel left a comment

lucasgameiroborges left a comment

[DPE-4532] Increase timeout and terminate processes that are still up #514

[DPE-4532] Increase timeout and terminate processes that are still up #514

Conversation

dragomirp commented Jun 30, 2024 • edited Loading

codecov bot commented Jun 30, 2024 • edited Loading

Codecov Report

dragomirp Jul 9, 2024

Choose a reason for hiding this comment

dragomirp Jul 9, 2024

Choose a reason for hiding this comment

dragomirp Jul 9, 2024

Choose a reason for hiding this comment

marceloneppel left a comment

Choose a reason for hiding this comment

lucasgameiroborges left a comment

Choose a reason for hiding this comment

dragomirp commented Jun 30, 2024 •

edited

Loading

codecov bot commented Jun 30, 2024 •

edited

Loading