Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-4114] Test: Scale to zero units #347

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
9d7bfed
deployment test "zero-units"
BalabaDmitri Feb 6, 2024
938035f
deployment test "zero-units"
BalabaDmitri Mar 4, 2024
7dc328b
Zero-units: continuous writes ON, deploy 3 units, check, scale to 0 u…
BalabaDmitri Mar 12, 2024
b762ec8
Zero-units: continuous writes ON, deploy 3 units, check, scale to 0 u…
BalabaDmitri Mar 12, 2024
0ca9740
Zero-units: continuous writes ON, deploy 3 units, check, scale to 0 u…
BalabaDmitri Mar 12, 2024
04bc51c
run format & lint
BalabaDmitri Mar 12, 2024
171b53f
reduce time out
BalabaDmitri Mar 12, 2024
27c97f4
merge from remote main
BalabaDmitri Mar 12, 2024
8382d0d
remove replication storage list
BalabaDmitri Mar 12, 2024
d467d8c
checking after scale to 2 and checking after scale up to 3
BalabaDmitri Mar 12, 2024
526357b
checking after scale to 2 and checking after scale up to 3
BalabaDmitri Mar 12, 2024
4b64ce9
checking after scale to 2 and checking after scale up to 3
BalabaDmitri Mar 12, 2024
927ad24
run format & lint
BalabaDmitri Mar 12, 2024
a18b1d3
handle error: storage belongs to different cluster
BalabaDmitri Apr 3, 2024
18211ed
handle error: storage belongs to different cluster
BalabaDmitri Apr 4, 2024
d917d88
handling different versions of Postgres of unit
BalabaDmitri Apr 12, 2024
0a0486f
fix unit fixed setting postgresql version into app_peer_data
BalabaDmitri Apr 17, 2024
ab160f3
merge canonical/main
BalabaDmitri Apr 18, 2024
263a1ef
format
Apr 18, 2024
a1b24dd
fix record of postgres version in databags
BalabaDmitri Apr 27, 2024
19574bd
Merge remote-tracking branch 'canorigin/main' into deployment-zero-units
BalabaDmitri Apr 29, 2024
6873326
format & lint
BalabaDmitri Apr 29, 2024
6716eaf
merge canonical/postgresql-operator
BalabaDmitri May 7, 2024
41bfc2f
Merge remote-tracking branch 'canorigin/main' into deployment-zero-units
BalabaDmitri Jun 10, 2024
2b7db14
checking blocked status based using blocking message
BalabaDmitri Jun 11, 2024
ef84bf6
Merge branch 'main' of https://github.com/canonical/postgresql-operat…
BalabaDmitri Jun 11, 2024
e670781
Merge branch 'canonical_main' into deployment-zero-units
BalabaDmitri Jun 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 77 additions & 1 deletion tests/integration/ha_tests/test_self_healing.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import logging

import pytest
from pip._vendor import requests
from pytest_operator.plugin import OpsTest
from tenacity import Retrying, stop_after_delay, wait_fixed

Expand All @@ -14,7 +15,7 @@
get_machine_from_unit,
get_password,
get_unit_address,
run_command_on_unit,
run_command_on_unit, scale_application,
)
from .conftest import APPLICATION_NAME
from .helpers import (
Expand Down Expand Up @@ -540,3 +541,78 @@ async def test_network_cut_without_ip_change(
), "Connection is not possible after network restore"

await is_cluster_updated(ops_test, primary_name)

@pytest.mark.group(1)
async def test_deploy_zero_units(ops_test: OpsTest):
"""Scale the database to zero units and scale up again."""
wait_for_apps = False
if not await app_name(ops_test):
wait_for_apps = True
async with ops_test.fast_forward():
await ops_test.model.deploy(
APP_NAME,
application_name=APP_NAME,
num_units=3,
storage={"pgdata": {"pool": "lxd-btrfs", "size": 2048}},
series=CHARM_SERIES,
channel="edge",
)

# Deploy the continuous writes application charm if it wasn't already deployed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part can be removed, as the continuous writes application is already deployed by test_build_and_deploy.

if not await app_name(ops_test, APPLICATION_NAME):
wait_for_apps = True
async with ops_test.fast_forward():
await ops_test.model.deploy(
APPLICATION_NAME,
application_name=APPLICATION_NAME,
series=CHARM_SERIES,
channel="edge",
)

if wait_for_apps:
await ops_test.model.wait_for_idle(status="active", timeout=3000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the above comment is handled, this line can be moved close to the deployment of the PostgreSQL application.


# Start an application that continuously writes data to the database.
await start_continuous_writes(ops_test, APP_NAME)

logger.info("checking whether writes are increasing")
await are_writes_increasing(ops_test)

unit_ip_addresses = []
storage_id_list = []
primary_name = await get_primary(ops_test, APP_NAME)
primary_storage = ""
for unit in ops_test.model.applications[APP_NAME].units:
# Save IP addresses of units
unit_ip_addresses.append(await get_unit_ip(ops_test, unit.name))

# Save detached storage ID
if primary_name != unit.name:
storage_id_list.append(storage_id(ops_test, unit.name))
else:
primary_storage = storage_id(ops_test, unit.name)

# Scale the database to zero units.
logger.info("scaling database to zero units")
await scale_application(ops_test, APP_NAME, 0)

# Checking shutdown units
for unit_ip in unit_ip_addresses:
try:
resp = requests.get(f"http://{unit_ip}:8008")
assert resp.status_code != 200, f"status code = {resp.status_code}, message = {resp.text}"
except requests.exceptions.ConnectionError as e:
assert True, f"unit host = http://{unit_ip}:8008, all units shutdown"
except Exception as e:
assert False, f"{e} unit host = http://{unit_ip}:8008, something went wrong"

# Scale the database to one unit.
logger.info("scaling database to one unit")
await add_unit_with_storage(ops_test, storage=primary_storage, app=APP_NAME)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the unit starts, we should check if the data on the storage has been actually restored.

logger.info("checking whether writes are increasing")
await are_writes_increasing(ops_test)

# Scale the database to three units.
for store_id in storage_id_list:
await add_unit_with_storage(ops_test, storage=store_id, app=APP_NAME)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marceloneppel

JFMI, should re use ops lib directly as in helper it refers to this which is resolved:

    Note: this function exists as a temporary solution until this issue is resolved:
    https://github.com/juju/python-libjuju/issues/695

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We should remove the workaround and use the methods provided by the lib.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC this is only available in libjuju 3

await check_writes(ops_test)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After 2nd and 3rd units start, it is needed to check that data on them is restored from WAL (not via backup/restore).
Maybe @dragomirp or @marceloneppel know how to check this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loading