From ace019988e5a3eae557841e4a112cbefd0663045 Mon Sep 17 00:00:00 2001 From: Diana Popa Date: Mon, 14 Nov 2022 17:17:47 +0200 Subject: [PATCH 01/13] doc: update performance details for snapshots Signed-off-by: Diana Popa --- docs/snapshotting/snapshot-support.md | 34 +++++++++++++++++---------- 1 file changed, 21 insertions(+), 13 deletions(-) diff --git a/docs/snapshotting/snapshot-support.md b/docs/snapshotting/snapshot-support.md index c7ed6249e902..c48139768057 100644 --- a/docs/snapshotting/snapshot-support.md +++ b/docs/snapshotting/snapshot-support.md @@ -93,26 +93,34 @@ snapshot files by implementing authentication and encryption schemes while managing their lifecycle or moving them across the trust boundary, like for example when provisioning them from a respository to a host over the network. -Firecracker is optimized for fast load/resume and it's designed to do some very basic -sanity checks only on the vm state file. It only verifies integrity using a 64 -bit CRC value embedded in the vm state file, but this is only as a partial -measure to protect against accidental corruption, as the disk files and memory -file need to be secured as well. It is important to note that CRC computation -is validated before trying to load the snapshot. Should it encounter failure, -an error will be shown to the user and the Firecracker process will be terminated. +Firecracker is optimized for fast load/resume, and it's designed to do some +very basic sanity checks only on the vm state file. It only verifies integrity +using a 64-bit CRC value embedded in the vm state file, but this is only +as a partial measure to protect against accidental corruption, as the disk +files and memory file need to be secured as well. It is important to note that +CRC computation is validated before trying to load the snapshot. Should it +encounter failure, an error will be shown to the user and the Firecracker +process will be terminated. ### Performance The Firecracker snapshot create/resume performance depends on the memory size, -vCPU count and emulated devices count. The Firecracker CI runs snapshots tests -on AWS **m5d.metal** instances for Intel and on AWS **m6g.metal** for ARM. -The baseline for snapshot resume latency target on Intel is under **8ms** with -5ms p90, and on ARM is under **3ms** for a microVM with the following specs: -2vCPU/512MB/1 block/1 net device. +vCPU count and emulated devices count. +The Firecracker CI runs snapshots tests on: + +- AWS **m5d.metal** and **m6i.metal** instances for Intel +- AWS **m6g.metal** for ARM +- AWS **m6a.metal** for AMD + +We are running nightly performance tests for all the enumerated platforms on +all supported kernel versions. +The baselines can be found in their [respective config file](../../tests/integration_tests/performance/configs/). ### Known issues -- High snapshot latency on 5.4+ host kernels - [#2129](https://github.com/firecracker-microvm/firecracker/issues/2129) +- High snapshot latency on 5.4+ host kernels due to cgroups V1. We +- strongly recommend to deploy snapshots on cgroups V2 enabled hosts for the +- implied kernel versions - [related issue](https://github.com/firecracker-microvm/firecracker/issues/2129). - Guest network connectivity is not guaranteed to be preserved after resume. For recommendations related to guest network connectivity for clones please see [Network connectivity for clones](network-for-clones.md). From c84f72b62e222d39a954e197822c64cc4229ea66 Mon Sep 17 00:00:00 2001 From: Diana Popa Date: Mon, 14 Nov 2022 17:49:21 +0200 Subject: [PATCH 02/13] snaps: reword doc around #snapsafe story Signed-off-by: Diana Popa --- docs/snapshotting/snapshot-support.md | 30 +++++++++++++++++---------- 1 file changed, 19 insertions(+), 11 deletions(-) diff --git a/docs/snapshotting/snapshot-support.md b/docs/snapshotting/snapshot-support.md index c48139768057..f84dbbba6b78 100644 --- a/docs/snapshotting/snapshot-support.md +++ b/docs/snapshotting/snapshot-support.md @@ -8,7 +8,8 @@ - [Overview](#overview) - [Snapshot files management](#snapshot-files-management) - [Performance](#performance) - - [Known issues](#known-issues) + - [Developer preview status](#developer-preview-status) + - [Limitations](#limitations) - [Firecracker Snapshotting characteristics](#firecracker-snapshotting-characteristics) - [Snapshot versioning](#snapshot-versioning) - [Snapshot API](#snapshot-api) @@ -38,6 +39,7 @@ guest workload at that particular point in time. The Firecracker snapshot feature is in [developer preview](../RELEASE_POLICY.md) on all CPU micro-architectures listed in [README](../../README.md#supported-platforms). +See [this section](#developer-preview-status) for more info. ### Overview @@ -82,8 +84,6 @@ resumed microVM. The Firecracker snapshot design offers a very simple interface to interact with snapshots but provides no functionality to package or manage them on the host. -Using snapshots in production is currently not recommended as there are open -[Known issues](#known-issues). The [threat containment model](../design.md#threat-containment) states that the host, host/API communication and snapshot files are trusted by Firecracker. @@ -96,7 +96,7 @@ example when provisioning them from a respository to a host over the network. Firecracker is optimized for fast load/resume, and it's designed to do some very basic sanity checks only on the vm state file. It only verifies integrity using a 64-bit CRC value embedded in the vm state file, but this is only -as a partial measure to protect against accidental corruption, as the disk +a partial measure to protect against accidental corruption, as the disk files and memory file need to be secured as well. It is important to note that CRC computation is validated before trying to load the snapshot. Should it encounter failure, an error will be shown to the user and the Firecracker @@ -106,7 +106,7 @@ process will be terminated. The Firecracker snapshot create/resume performance depends on the memory size, vCPU count and emulated devices count. -The Firecracker CI runs snapshots tests on: +The Firecracker CI runs snapshot tests on: - AWS **m5d.metal** and **m6i.metal** instances for Intel - AWS **m6g.metal** for ARM @@ -116,18 +116,26 @@ We are running nightly performance tests for all the enumerated platforms on all supported kernel versions. The baselines can be found in their [respective config file](../../tests/integration_tests/performance/configs/). -### Known issues +### Developer preview status + +The snapshot functionality is still in developer preview due to the following: + +- Poor entropy and replayable randomness when resuming multiple microvms from + the same snapshot. We do not recommend to use snapshotting in production if + there is no mechanism to guarantee proper secrecy and uniqueness between + guests. + Please see [Snapshot security and uniqueness](#snapshot-security-and-uniqueness). + +### Limitations - High snapshot latency on 5.4+ host kernels due to cgroups V1. We -- strongly recommend to deploy snapshots on cgroups V2 enabled hosts for the -- implied kernel versions - [related issue](https://github.com/firecracker-microvm/firecracker/issues/2129). + strongly recommend to deploy snapshots on cgroups V2 enabled hosts for the + implied kernel versions - [related issue](https://github.com/firecracker-microvm/firecracker/issues/2129). - Guest network connectivity is not guaranteed to be preserved after resume. For recommendations related to guest network connectivity for clones please see [Network connectivity for clones](network-for-clones.md). - Vsock device does not have full snapshotting support. Please see [Vsock device limitation](#vsock-device-limitation). -- Poor entropy and replayable randomness when resuming multiple microvms which - deal with cryptographic secrets. Please see [Snapshot security and uniqueness](#snapshot-security-and-uniqueness). - Snapshotting on arm64 works for both GICv2 and GICv3 enabled guests. However, restoring between different GIC version is not possible. @@ -550,7 +558,7 @@ Boot microVM A -> ... -> Create snapshot S -> Resume -> ... -> Load S in microVM B -> Resume -> ... ``` -Here, both microVM A and B do work staring from the state stored in snapshot S. +Here, both microVM A and B do work starting from the state stored in snapshot S. Unique identifiers, random numbers, and cryptographic tokens that are meant to be used once may be used twice. It doesn't matter if microVM A is terminated before microVM B resumes execution from snapshot S or not. In this example, we From c4d8086797f0162fbd472b26c7aabbfc0305f9b8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pablo=20Barb=C3=A1chano?= Date: Tue, 15 Nov 2022 13:05:37 +0100 Subject: [PATCH 03/13] fix(ci): only run style in x86_64 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Pablo Barbáchano --- .buildkite/pipeline_pr.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/.buildkite/pipeline_pr.py b/.buildkite/pipeline_pr.py index 63f1b17fd1c1..b5e8dfc246f4 100755 --- a/.buildkite/pipeline_pr.py +++ b/.buildkite/pipeline_pr.py @@ -55,7 +55,8 @@ def group(group_name, command, agent_tags=None, priority=0, timeout=30): step_style = { "command": "./tools/devtool -y test -- ../tests/integration_tests/style/", "label": "🪶 Style", - # no agent tags, it doesn't matter where this runs + # we only install the required dependencies in x86_64 + "agents": ["platform=x86_64.metal"] } build_grp = group( From 9cb8ea8ed8325d7e547a346bd1dbc639ba04063b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pablo=20Barb=C3=A1chano?= Date: Wed, 16 Nov 2022 10:06:57 +0100 Subject: [PATCH 04/13] fix(tests): disable Pylint line-too-long MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Pylint uses a line-too-long max-line width of 100[1]. Black uses a soft-limit of 88[2], so they conflict in some situations. Let's disable Pylint line-too-long check and depend only on black for Python formatting. [1]: https://docs.pylint.org/features.html#id18 [2]: https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#line-length Signed-off-by: Pablo Barbáchano --- tests/integration_tests/build/test_pylint.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/integration_tests/build/test_pylint.py b/tests/integration_tests/build/test_pylint.py index 5e269d7bfec2..4efd84f69b2f 100644 --- a/tests/integration_tests/build/test_pylint.py +++ b/tests/integration_tests/build/test_pylint.py @@ -20,7 +20,7 @@ def test_python_pylint(): '--variable-rgx="[a-z_][a-z0-9_]{1,30}$" --disable=' "fixme,too-many-instance-attributes,import-error," "too-many-locals,too-many-arguments,consider-using-f-string," - "consider-using-with,implicit-str-concat" + "consider-using-with,implicit-str-concat,line-too-long" ) # Get all *.py files from the project From 046dc8eb0f5a98347bb38b117179eb7e37f3b066 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pablo=20Barb=C3=A1chano?= Date: Wed, 16 Nov 2022 10:11:21 +0100 Subject: [PATCH 05/13] fix(tests): fail if copyright message is not found MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Missed the case when the copyright notice does not appear anywhere in the file. Signed-off-by: Pablo Barbáchano --- tests/integration_tests/style/test_licenses.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tests/integration_tests/style/test_licenses.py b/tests/integration_tests/style/test_licenses.py index fdcd373fac16..56e1772ccf57 100644 --- a/tests/integration_tests/style/test_licenses.py +++ b/tests/integration_tests/style/test_licenses.py @@ -57,6 +57,8 @@ def _validate_license(filename): if line.startswith(("// Copyright", "# Copyright")): copyright_info = line break + if line == "": + return False has_amazon_copyright = _has_amazon_copyright( copyright_info From fb21ead6a35d76610bd2cab48999ece50c8d2037 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pablo=20Barb=C3=A1chano?= Date: Wed, 16 Nov 2022 12:08:00 +0100 Subject: [PATCH 06/13] feat(tests): add msecs to logs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Pablo Barbáchano --- tests/pytest.ini | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/pytest.ini b/tests/pytest.ini index af57aab351bf..21ec0761ce47 100644 --- a/tests/pytest.ini +++ b/tests/pytest.ini @@ -14,7 +14,7 @@ cache_dir = ../build/pytest_cache ; Set logger format and level log_level = INFO -log_format = %(asctime)s %(name)s: %(levelname)s %(message)s +log_format = %(asctime)s.%(msecs)03d %(name)s: %(levelname)s %(message)s log_cli_level = ERROR log_cli = true From 11aa8a1950eeb5518b37f6f4ff523045f35d4ea0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pablo=20Barb=C3=A1chano?= Date: Wed, 16 Nov 2022 14:02:19 +0100 Subject: [PATCH 07/13] feat(tests): show locals on tracebacks MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Pablo Barbáchano --- tests/pytest.ini | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tests/pytest.ini b/tests/pytest.ini index 21ec0761ce47..1fa94aeb28e9 100644 --- a/tests/pytest.ini +++ b/tests/pytest.ini @@ -1,6 +1,8 @@ [pytest] ; Omit verbose tracebacks, since they tend to pollute the output. -addopts = --tb=short +addopts = + --tb=short + --showlocals ; Overwrite the default norecurseirs, which includes 'build'. norecursedirs = .* From 2fe205bc30b62a264d937a22c1afcb1ad4086adb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pablo=20Barb=C3=A1chano?= Date: Wed, 16 Nov 2022 21:39:06 +0100 Subject: [PATCH 08/13] chore(tests): separate tests by parametrizing MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Pablo Barbáchano --- .../functional/test_snapshot_advanced.py | 181 ++++++++++-------- 1 file changed, 99 insertions(+), 82 deletions(-) diff --git a/tests/integration_tests/functional/test_snapshot_advanced.py b/tests/integration_tests/functional/test_snapshot_advanced.py index 16bb73f7dbf5..09450e05c5fa 100644 --- a/tests/integration_tests/functional/test_snapshot_advanced.py +++ b/tests/integration_tests/functional/test_snapshot_advanced.py @@ -8,7 +8,11 @@ import pytest from test_balloon import _test_rss_memory_lower from conftest import _test_images_s3_bucket -from framework.artifacts import ArtifactCollection, create_net_devices_configuration +from framework.artifacts import ( + ArtifactCollection, + FirecrackerArtifact, + create_net_devices_configuration, +) from framework.builder import MicrovmBuilder, SnapshotBuilder, SnapshotType from framework.utils import get_firecracker_version_from_toml import host_tools.network as net_tools # pylint: disable=import-error @@ -21,7 +25,26 @@ scratch_drives = ["vdb", "vdc", "vdd", "vde", "vdf"] -def test_restore_old_snapshot(bin_cloner_path): +def firecracker_artifacts(*args, **kwargs): + """Return all available firecracker binaries.""" + artifacts = ArtifactCollection(_test_images_s3_bucket()) + # Fetch all firecracker binaries. + return artifacts.firecrackers(*args, **kwargs) + + +def firecracker_id(fc): + """Render a nice ID for pytest parametrize.""" + if isinstance(fc, FirecrackerArtifact): + return f"firecracker-{fc.version}" + return None + + +@pytest.mark.parametrize( + "firecracker", + firecracker_artifacts(max_version=get_firecracker_version_from_toml()), + ids=firecracker_id, +) +def test_restore_old_snapshot(bin_cloner_path, firecracker): """ Restore from snapshots obtained with previous versions of Firecracker. @@ -31,106 +54,100 @@ def test_restore_old_snapshot(bin_cloner_path): logger = logging.getLogger("old_snapshot_many_devices") builder = MicrovmBuilder(bin_cloner_path) - artifacts = ArtifactCollection(_test_images_s3_bucket()) - # Fetch all firecracker binaries. # With each binary create a snapshot and try to restore in current # version. - firecracker_artifacts = artifacts.firecrackers( - max_version=get_firecracker_version_from_toml() - ) - for firecracker in firecracker_artifacts: - firecracker.download() - jailer = firecracker.jailer() - jailer.download() - - logger.info("Creating snapshot with Firecracker: %s", firecracker.local_path()) - logger.info("Using Jailer: %s", jailer.local_path()) - - target_version = firecracker.base_name()[1:] - - # v0.23 does not support creating diff snapshots. - # v0.23 does not support balloon. - diff_snapshots = "0.23" not in target_version - - # Create a snapshot. - snapshot = create_snapshot_helper( - builder, - logger, - drives=scratch_drives, - ifaces=net_ifaces, - fc_binary=firecracker.local_path(), - jailer_binary=jailer.local_path(), - diff_snapshots=diff_snapshots, - balloon=diff_snapshots, - ) + firecracker.download() + jailer = firecracker.jailer() + jailer.download() - # Resume microvm using current build of FC/Jailer. - microvm, _ = builder.build_from_snapshot( - snapshot, resume=True, diff_snapshots=False - ) - validate_all_devices( - logger, microvm, net_ifaces, scratch_drives, diff_snapshots - ) - logger.debug("========== Firecracker restore snapshot log ==========") - logger.debug(microvm.log_data) + logger.info("Creating snapshot with Firecracker: %s", firecracker.local_path()) + logger.info("Using Jailer: %s", jailer.local_path()) + + target_version = firecracker.base_name()[1:] + + # v0.23 does not support creating diff snapshots. + # v0.23 does not support balloon. + diff_snapshots = "0.23" not in target_version + + # Create a snapshot. + snapshot = create_snapshot_helper( + builder, + logger, + drives=scratch_drives, + ifaces=net_ifaces, + fc_binary=firecracker.local_path(), + jailer_binary=jailer.local_path(), + diff_snapshots=diff_snapshots, + balloon=diff_snapshots, + ) + + # Resume microvm using current build of FC/Jailer. + microvm, _ = builder.build_from_snapshot( + snapshot, resume=True, diff_snapshots=False + ) + validate_all_devices(logger, microvm, net_ifaces, scratch_drives, diff_snapshots) + logger.debug("========== Firecracker restore snapshot log ==========") + logger.debug(microvm.log_data) -def test_restore_old_version(bin_cloner_path): +@pytest.mark.parametrize( + "firecracker", + firecracker_artifacts( + min_version="1.2.0", + max_version=get_firecracker_version_from_toml(), + ), + ids=firecracker_id, +) +def test_restore_old_version(bin_cloner_path, firecracker): """ Restore current snapshot with previous versions of Firecracker. + Current snapshot (i.e a machine snapshotted with current build) is + incompatible with any past release due to notification suppression. + @type: functional """ # Microvm: 2vCPU 256MB RAM, balloon, 4 disks and 4 net devices. logger = logging.getLogger("old_snapshot_version_many_devices") builder = MicrovmBuilder(bin_cloner_path) - artifacts = ArtifactCollection(_test_images_s3_bucket()) - # Fetch all firecracker binaries. # Create a snapshot with current build and restore with each FC binary # artifact. - firecracker_artifacts = artifacts.firecrackers( - # current snapshot (i.e a machine snapshotted with current build) - # is incompatible with any past release due to notification suppression. - min_version="1.2.0", - max_version=get_firecracker_version_from_toml(), + firecracker.download() + jailer = firecracker.jailer() + jailer.download() + + logger.info("Creating snapshot with local build") + + # Old version from artifact. + target_version = firecracker.base_name()[1:] + + # Create a snapshot with current FC version targeting the old version. + snapshot = create_snapshot_helper( + builder, + logger, + target_version=target_version, + drives=scratch_drives, + ifaces=net_ifaces, + balloon=True, + diff_snapshots=True, ) - for firecracker in firecracker_artifacts: - firecracker.download() - jailer = firecracker.jailer() - jailer.download() - - logger.info("Creating snapshot with local build") - - # Old version from artifact. - target_version = firecracker.base_name()[1:] - - # Create a snapshot with current FC version targeting the old version. - snapshot = create_snapshot_helper( - builder, - logger, - target_version=target_version, - drives=scratch_drives, - ifaces=net_ifaces, - balloon=True, - diff_snapshots=True, - ) - logger.info("Restoring snapshot with Firecracker: %s", firecracker.local_path()) - logger.info("Using Jailer: %s", jailer.local_path()) + logger.info("Restoring snapshot with Firecracker: %s", firecracker.local_path()) + logger.info("Using Jailer: %s", jailer.local_path()) - # Resume microvm using FC/Jailer binary artifacts. - vm, _ = builder.build_from_snapshot( - snapshot, - resume=True, - diff_snapshots=False, - fc_binary=firecracker.local_path(), - jailer_binary=jailer.local_path(), - ) - validate_all_devices(logger, vm, net_ifaces, scratch_drives, True) - logger.debug("========== Firecracker restore snapshot log ==========") - logger.debug(vm.log_data) + # Resume microvm using FC/Jailer binary artifacts. + vm, _ = builder.build_from_snapshot( + snapshot, + resume=True, + diff_snapshots=False, + fc_binary=firecracker.local_path(), + jailer_binary=jailer.local_path(), + ) + validate_all_devices(logger, vm, net_ifaces, scratch_drives, True) + logger.debug("========== Firecracker restore snapshot log ==========") + logger.debug(vm.log_data) @pytest.mark.skipif(platform.machine() != "x86_64", reason="TSC is x86_64 specific.") From 4b945db4b7ab477cd5783f9af453bcc4e205de13 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pablo=20Barb=C3=A1chano?= Date: Thu, 17 Nov 2022 11:55:51 +0100 Subject: [PATCH 09/13] chore(tests): add more logging in case of error MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Pablo Barbáchano --- .../functional/test_balloon.py | 21 ++++++++++++------- .../functional/test_snapshot_basic.py | 1 - 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/tests/integration_tests/functional/test_balloon.py b/tests/integration_tests/functional/test_balloon.py index dcc042df7a65..9d4e6f1c01c9 100644 --- a/tests/integration_tests/functional/test_balloon.py +++ b/tests/integration_tests/functional/test_balloon.py @@ -45,22 +45,29 @@ def get_rss_from_pmap(): def make_guest_dirty_memory(ssh_connection, should_oom=False, amount=8192): """Tell the guest, over ssh, to dirty `amount` pages of memory.""" + logger = logging.getLogger("make_guest_dirty_memory") + amount_in_mbytes = amount / MB_TO_PAGES - exit_code, _, _ = ssh_connection.execute_command( - "/sbin/fillmem {}".format(amount_in_mbytes) - ) + cmd = f"/sbin/fillmem {amount_in_mbytes}" + exit_code, stdout, stderr = ssh_connection.execute_command(cmd) + # add something to the logs for troubleshooting + if exit_code != 0: + logger.error("while running: %s", cmd) + logger.error("stdout: %s", stdout.read()) + logger.error("stderr: %s", stderr.read()) cmd = "cat /tmp/fillmem_output.txt" _, stdout, _ = ssh_connection.execute_command(cmd) if should_oom: assert ( - exit_code == 0 - and ("OOM Killer stopped the program with " "signal 9, exit code 0") - in stdout.read() + "OOM Killer stopped the program with " + "signal 9, exit code 0" in stdout.read() ) else: - assert exit_code == 0 and ("Memory filling was " "successful") in stdout.read() + assert exit_code == 0, stderr.read() + stdout_txt = stdout.read() + assert "Memory filling was successful" in stdout_txt, stdout_txt def build_test_matrix(network_config, bin_cloner_path, logger): diff --git a/tests/integration_tests/functional/test_snapshot_basic.py b/tests/integration_tests/functional/test_snapshot_basic.py index faf952f38263..97d3fe5c9670 100644 --- a/tests/integration_tests/functional/test_snapshot_basic.py +++ b/tests/integration_tests/functional/test_snapshot_basic.py @@ -103,7 +103,6 @@ def _test_seq_snapshots(context): microvm, _ = vm_builder.build_from_snapshot( snapshot, resume=True, diff_snapshots=diff_snapshots ) - # Test vsock guest-initiated connections. path = os.path.join( microvm.path, make_host_port_path(VSOCK_UDS_PATH, ECHO_SERVER_PORT) From 50f290e4f5e2bb26eab0340f2debf445ed23fe18 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pablo=20Barb=C3=A1chano?= Date: Thu, 17 Nov 2022 11:58:34 +0100 Subject: [PATCH 10/13] chore(tests): unroll tests by using parametrize MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Unroll the tests by using parametrize and avoid use of TestMatrix. Signed-off-by: Pablo Barbáchano --- .../functional/test_snapshot_basic.py | 157 ++++++------------ 1 file changed, 48 insertions(+), 109 deletions(-) diff --git a/tests/integration_tests/functional/test_snapshot_basic.py b/tests/integration_tests/functional/test_snapshot_basic.py index 97d3fe5c9670..e98fd493d7d8 100644 --- a/tests/integration_tests/functional/test_snapshot_basic.py +++ b/tests/integration_tests/functional/test_snapshot_basic.py @@ -8,6 +8,8 @@ import tempfile from pathlib import Path +import pytest + from conftest import _test_images_s3_bucket from framework.artifacts import ArtifactCollection, ArtifactSet from framework.builder import MicrovmBuilder, SnapshotBuilder, SnapshotType @@ -40,38 +42,59 @@ def _get_guest_drive_size(ssh_connection, guest_dev_name="/dev/vdb"): return stdout.readline().strip() -def _test_seq_snapshots(context): - logger = context.custom["logger"] - seq_len = context.custom["seq_len"] - vm_builder = context.custom["builder"] - snapshot_type = context.custom["snapshot_type"] +ARTIFACTS = ArtifactCollection(_test_images_s3_bucket()) + +# Testing matrix: +# - Guest kernel: All supported ones +# - Rootfs: Ubuntu 18.04 +# - Microvm: 2vCPU with 512 MB RAM +# TODO: Multiple microvm sizes must be tested in the async pipeline. +@pytest.mark.parametrize( + "microvm", ARTIFACTS.microvms(keyword="2vcpu_512mb"), ids=lambda x: x.name() +) +@pytest.mark.parametrize("kernel", ARTIFACTS.kernels(), ids=lambda x: x.name()) +@pytest.mark.parametrize( + "disk", ARTIFACTS.disks(keyword="ubuntu"), ids=lambda x: x.name() +) +@pytest.mark.parametrize("snapshot_type", [SnapshotType.DIFF, SnapshotType.FULL]) +def test_5_snapshots( + bin_cloner_path, + bin_vsock_path, + test_fc_session_root_path, + microvm, + kernel, + disk, + snapshot_type, +): + """ + Create and load 5 snapshots. + + @type: functional + """ + logger = logging.getLogger("snapshot_sequence") + + vm_builder = MicrovmBuilder(bin_cloner_path) + seq_len = 5 diff_snapshots = snapshot_type == SnapshotType.DIFF - logger.info( - 'Testing {} with microvm: "{}", kernel {}, disk {} '.format( - snapshot_type, - context.microvm.name(), - context.kernel.name(), - context.disk.name(), - ) - ) + disk.download() + kernel.download() + microvm.download() # Create a rw copy artifact. - root_disk = context.disk.copy() + root_disk = disk.copy() # Get ssh key from read-only artifact. - ssh_key = context.disk.ssh_key() + ssh_key = disk.ssh_key() # Create a fresh microvm from artifacts. vm_instance = vm_builder.build( - kernel=context.kernel, + kernel=kernel, disks=[root_disk], ssh_key=ssh_key, - config=context.microvm, + config=microvm, diff_snapshots=diff_snapshots, ) basevm = vm_instance.vm - basevm.vsock.put( - vsock_id="vsock0", guest_cid=3, uds_path="/{}".format(VSOCK_UDS_PATH) - ) + basevm.vsock.put(vsock_id="vsock0", guest_cid=3, uds_path=f"/{VSOCK_UDS_PATH}") basevm.start() ssh_connection = net_tools.SSHConnection(basevm.ssh_config) @@ -80,15 +103,13 @@ def _test_seq_snapshots(context): exit_code, _, _ = ssh_connection.execute_command("sync") assert exit_code == 0 - test_fc_session_root_path = context.custom["test_fc_session_root_path"] - vsock_helper = context.custom["bin_vsock_path"] vm_blob_path = "/tmp/vsock/test.blob" # Generate a random data file for vsock. blob_path, blob_hash = make_blob(test_fc_session_root_path) # Copy the data file and a vsock helper to the guest. - _copy_vsock_data_to_guest(ssh_connection, blob_path, vm_blob_path, vsock_helper) + _copy_vsock_data_to_guest(ssh_connection, blob_path, vm_blob_path, bin_vsock_path) - logger.info("Create {} #0.".format(snapshot_type)) + logger.info("Create %s #0.", snapshot_type) # Create a snapshot builder from a microvm. snapshot_builder = SnapshotBuilder(basevm) @@ -99,7 +120,7 @@ def _test_seq_snapshots(context): basevm.kill() for i in range(seq_len): - logger.info("Load snapshot #{}, mem {}".format(i, snapshot.mem)) + logger.info("Load snapshot #%s, mem %s", i, snapshot.mem) microvm, _ = vm_builder.build_from_snapshot( snapshot, resume=True, diff_snapshots=diff_snapshots ) @@ -117,7 +138,7 @@ def _test_seq_snapshots(context): # Check that the root device is not corrupted. check_filesystem(ssh_connection, "ext4", "/dev/vda") - logger.info("Create snapshot #{}.".format(i + 1)) + logger.info("Create snapshot #%d.", i + 1) # Create a snapshot builder from the currently running microvm. snapshot_builder = SnapshotBuilder(microvm) @@ -129,7 +150,7 @@ def _test_seq_snapshots(context): # If we are testing incremental snapshots we must merge the base with # current layer. if snapshot_type == SnapshotType.DIFF: - logger.info("Base: {}, Layer: {}".format(base_snapshot.mem, snapshot.mem)) + logger.info("Base: %s, Layer: %s", base_snapshot.mem, snapshot.mem) snapshot.rebase_snapshot(base_snapshot) # Update the base for next iteration. base_snapshot = snapshot @@ -244,88 +265,6 @@ def test_patch_drive_snapshot(bin_cloner_path): microvm.kill() -def test_5_full_snapshots( - network_config, bin_cloner_path, bin_vsock_path, test_fc_session_root_path -): - """ - Create and load 5 full sequential snapshots. - - @type: functional - """ - logger = logging.getLogger("snapshot_sequence") - - artifacts = ArtifactCollection(_test_images_s3_bucket()) - # Testing matrix: - # - Guest kernel: All supported ones - # - Rootfs: Ubuntu 18.04 - # - Microvm: 2vCPU with 512 MB RAM - # TODO: Multiple microvm sizes must be tested in the async pipeline. - microvm_artifacts = ArtifactSet(artifacts.microvms(keyword="2vcpu_512mb")) - kernel_artifacts = ArtifactSet(artifacts.kernels()) - disk_artifacts = ArtifactSet(artifacts.disks(keyword="ubuntu")) - - # Create a test context and add builder, logger, network. - test_context = TestContext() - test_context.custom = { - "builder": MicrovmBuilder(bin_cloner_path), - "network_config": network_config, - "logger": logger, - "snapshot_type": SnapshotType.FULL, - "seq_len": 5, - "bin_vsock_path": bin_vsock_path, - "test_fc_session_root_path": test_fc_session_root_path, - } - - # Create the test matrix. - test_matrix = TestMatrix( - context=test_context, - artifact_sets=[microvm_artifacts, kernel_artifacts, disk_artifacts], - ) - - test_matrix.run_test(_test_seq_snapshots) - - -def test_5_inc_snapshots( - network_config, bin_cloner_path, bin_vsock_path, test_fc_session_root_path -): - """ - Create and load 5 incremental snapshots. - - @type: functional - """ - logger = logging.getLogger("snapshot_sequence") - - artifacts = ArtifactCollection(_test_images_s3_bucket()) - # Testing matrix: - # - Guest kernel: All supported ones - # - Rootfs: Ubuntu 18.04 - # - Microvm: 2vCPU with 512MB RAM - # TODO: Multiple microvm sizes must be tested in the async pipeline. - microvm_artifacts = ArtifactSet(artifacts.microvms(keyword="2vcpu_512mb")) - kernel_artifacts = ArtifactSet(artifacts.kernels()) - disk_artifacts = ArtifactSet(artifacts.disks(keyword="ubuntu")) - - # Create a test context and add builder, logger, network. - test_context = TestContext() - test_context.custom = { - "builder": MicrovmBuilder(bin_cloner_path), - "network_config": network_config, - "logger": logger, - "snapshot_type": SnapshotType.DIFF, - "seq_len": 5, - "bin_vsock_path": bin_vsock_path, - "test_fc_session_root_path": test_fc_session_root_path, - } - - # Create the test matrix. - test_matrix = TestMatrix( - context=test_context, - artifact_sets=[microvm_artifacts, kernel_artifacts, disk_artifacts], - ) - - test_matrix.run_test(_test_seq_snapshots) - - def test_load_snapshot_failure_handling(test_microvm_with_api): """ Test error case of loading empty snapshot files. From 20e18a8edf7b5fe39a98297b2b3b1d5499ccc3c0 Mon Sep 17 00:00:00 2001 From: Diana Popa Date: Tue, 15 Nov 2022 11:50:13 +0200 Subject: [PATCH 11/13] build_kernel: check for errors Signed-off-by: Diana Popa --- tools/devtool | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/tools/devtool b/tools/devtool index 5ea7e79f73f2..bc0cc07b20e2 100755 --- a/tools/devtool +++ b/tools/devtool @@ -2000,7 +2000,7 @@ cmd_build_kernel() { create_dir "$kernel_dir_host" # Extract the kernel version from the config file provided as parameter. - KERNEL_VERSION=$(cat "$KERNEL_CFG" | grep -Po "^# Linux\/$cfg_pattern (([0-9]+.)[0-9]+)" | cut -d ' ' -f 3) + KERNEL_VERSION=$(grep -Po "^# Linux\/$cfg_pattern (([0-9]+.)[0-9]+)" "$KERNEL_CFG" | cut -d ' ' -f 3) validate_kernel_version "$KERNEL_VERSION" recipe_commit="b551cccc405a73a6d316c0c09dfe0b3e7a73ba3f" @@ -2008,7 +2008,8 @@ cmd_build_kernel() { run_devctr \ --user "$(id -u):$(id -g)" \ --workdir "$kernel_dir_ctr" \ - -- /bin/bash -c "curl -LO "$recipe_url" && source make_kernel.sh && extract_kernel_srcs "$KERNEL_VERSION"" + -- /bin/bash -c "curl -LO $recipe_url && source make_kernel.sh && extract_kernel_srcs $KERNEL_VERSION" + ok_or_die "Could not extract kernel sources with recipe $recipe_url!" cp "$KERNEL_CFG" "$kernel_dir_host/linux-$KERNEL_VERSION/.config" @@ -2016,7 +2017,8 @@ cmd_build_kernel() { run_devctr \ --user "$(id -u):$(id -g)" \ --workdir "$kernel_dir_ctr" \ - -- /bin/bash -c "source make_kernel.sh && make_kernel "$kernel_dir_ctr/linux-$KERNEL_VERSION" $format $target "$nprocs" "$KERNEL_BINARY_NAME"" + -- /bin/bash -c "source make_kernel.sh && make_kernel "$kernel_dir_ctr/linux-"$KERNEL_VERSION"" $format $target $nprocs $KERNEL_BINARY_NAME" + ok_or_die "Could not build kernel!" say "Successfully built kernel!" say "Kernel binary placed in: $kernel_dir_host/linux-$KERNEL_VERSION/$KERNEL_BINARY_NAME" From 85580d75d24e5fad777e9d3f5ba6ce22d5dd8e34 Mon Sep 17 00:00:00 2001 From: Jonathan Woollett-Light Date: Thu, 17 Nov 2022 14:48:25 +0000 Subject: [PATCH 12/13] Linking `has_cpuid` todo Signed-off-by: Jonathan Woollett-Light --- src/cpuid/src/common.rs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/cpuid/src/common.rs b/src/cpuid/src/common.rs index 30c00616c0fb..78da507689c8 100644 --- a/src/cpuid/src/common.rs +++ b/src/cpuid/src/common.rs @@ -31,8 +31,8 @@ pub enum Error { /// Extract entry from the cpuid. #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] pub fn get_cpuid(function: u32, count: u32) -> Result { - // TODO: replace with validation based on `has_cpuid()` when it becomes stable: - // https://doc.rust-lang.org/core/arch/x86/fn.has_cpuid.html + // TODO: Use `core::arch::x86_64::has_cpuid` + // (https://github.com/firecracker-microvm/firecracker/issues/3271) #[cfg(target_env = "sgx")] { return Err(Error::NotSupported); From 1644b3cb4947b927f29da795492284b4b10f2405 Mon Sep 17 00:00:00 2001 From: Jonathan Woollett-Light Date: Thu, 17 Nov 2022 13:19:38 +0000 Subject: [PATCH 13/13] Exclude msr-tools package on Arm Signed-off-by: Jonathan Woollett-Light --- resources/tests/setup_rootfs.sh | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/resources/tests/setup_rootfs.sh b/resources/tests/setup_rootfs.sh index 0882d6bb941f..00aa3cbe7d1d 100755 --- a/resources/tests/setup_rootfs.sh +++ b/resources/tests/setup_rootfs.sh @@ -11,7 +11,14 @@ prepare_fc_rootfs() { SSH_DIR="$BUILD_DIR/ssh" RESOURCE_DIR="$2" - packages="udev systemd-sysv openssh-server iproute2 msr-tools" + packages="udev systemd-sysv openssh-server iproute2" + + # msr-tools is only supported on x86-64. + arch=$(uname -m) + if [ "${arch}" == "x86_64" ]; then + packages="$packages msr-tools" + fi + apt-get update apt-get install -y --no-install-recommends $packages