Skip to content

Commit

Permalink
compute: Add spec support for disabling LFC resizing (#10132)
Browse files Browse the repository at this point in the history
ref neondatabase/cloud#21731

## Problem

When we manually override the LFC size for particular computes,
autoscaling will typically undo that because vm-monitor will resize LFC
itself.

So, we'd like a way to make vm-monitor not set LFC size — this actually
already exists, if we just don't give vm-monitor a postgres connection
string.

## Summary of changes

Add a new field to the compute spec, `disable_lfc_resizing`. When set to
`true`, we pass in `None` for its postgres connection string. That
matches the configuration tested in `neondatabase/autoscaling` CI.
  • Loading branch information
sharnoff authored Jan 2, 2025
1 parent 363ea97 commit cd10c71
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 4 deletions.
19 changes: 15 additions & 4 deletions compute_tools/src/bin/compute_ctl.rs
Original file line number Diff line number Diff line change
Expand Up @@ -424,9 +424,13 @@ fn start_postgres(
"running compute with features: {:?}",
state.pspec.as_ref().unwrap().spec.features
);
// before we release the mutex, fetch the swap size (if any) for later.
let swap_size_bytes = state.pspec.as_ref().unwrap().spec.swap_size_bytes;
let disk_quota_bytes = state.pspec.as_ref().unwrap().spec.disk_quota_bytes;
// before we release the mutex, fetch some parameters for later.
let &ComputeSpec {
swap_size_bytes,
disk_quota_bytes,
disable_lfc_resizing,
..
} = &state.pspec.as_ref().unwrap().spec;
drop(state);

// Launch remaining service threads
Expand Down Expand Up @@ -531,11 +535,18 @@ fn start_postgres(
// This token is used internally by the monitor to clean up all threads
let token = CancellationToken::new();

// don't pass postgres connection string to vm-monitor if we don't want it to resize LFC
let pgconnstr = if disable_lfc_resizing.unwrap_or(false) {
None
} else {
file_cache_connstr.cloned()
};

let vm_monitor = rt.as_ref().map(|rt| {
rt.spawn(vm_monitor::start(
Box::leak(Box::new(vm_monitor::Args {
cgroup: cgroup.cloned(),
pgconnstr: file_cache_connstr.cloned(),
pgconnstr,
addr: vm_monitor_addr.clone(),
})),
token.clone(),
Expand Down
1 change: 1 addition & 0 deletions control_plane/src/endpoint.rs
Original file line number Diff line number Diff line change
Expand Up @@ -585,6 +585,7 @@ impl Endpoint {
features: self.features.clone(),
swap_size_bytes: None,
disk_quota_bytes: None,
disable_lfc_resizing: None,
cluster: Cluster {
cluster_id: None, // project ID: not used
name: None, // project name: not used
Expand Down
9 changes: 9 additions & 0 deletions libs/compute_api/src/spec.rs
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,15 @@ pub struct ComputeSpec {
#[serde(default)]
pub disk_quota_bytes: Option<u64>,

/// Disables the vm-monitor behavior that resizes LFC on upscale/downscale, instead relying on
/// the initial size of LFC.
///
/// This is intended for use when the LFC size is being overridden from the default but
/// autoscaling is still enabled, and we don't want the vm-monitor to interfere with the custom
/// LFC sizing.
#[serde(default)]
pub disable_lfc_resizing: Option<bool>,

/// Expected cluster state at the end of transition process.
pub cluster: Cluster,
pub delta_operations: Option<Vec<DeltaOp>>,
Expand Down

1 comment on commit cd10c71

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7366 tests run: 7003 passed, 1 failed, 362 skipped (full report)


Failures on Postgres 16

  • test_storage_controller_many_tenants[github-actions-selfhosted]: release-x86-64
# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_storage_controller_many_tenants[release-pg16-github-actions-selfhosted]"
Flaky tests (2)

Postgres 17

Postgres 14

  • test_physical_replication_config_mismatch_max_locks_per_transaction: release-arm64

Code coverage* (full report)

  • functions: 31.2% (8400 of 26937 functions)
  • lines: 47.9% (66687 of 139127 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
cd10c71 at 2025-01-02T21:55:59.346Z :recycle:

Please sign in to comment.