Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-4859] Apply sysctl from the charm #369

Closed
wants to merge 11 commits into from
Closed

Conversation

phvalguima
Copy link
Contributor

Production deployments will be using VMs instead of containers. In this case, we do not need to delegate the sysctl setup to the user but could rather do it on the charm.

Containers, on the other hand, cannot set all the types of sysctl:

root@juju-f3f50a-0:~# sysctl -w vm.max_map_count=262144
sysctl: permission denied on key "vm.max_map_count", ignoring
root@juju-f3f50a-0:~# sysctl -w vm.swappiness=0
sysctl: permission denied on key "vm.swappiness", ignoring
root@juju-f3f50a-0:~# sysctl -w net.ipv4.tcp_retries2=5
net.ipv4.tcp_retries2 = 5
root@juju-f3f50a-0:~# sysctl -w fs.file-max=1048576
sysctl: permission denied on key "fs.file-max", ignoring

This PR adds the sysctl lib and uses it to set all the parameters if we are running directly on the host, or only net.ipv4.tcp_retries2 otherwise.

Closes #359

README.md Show resolved Hide resolved
tox.ini Show resolved Hide resolved
"vm.swappiness": "0",
"net.ipv4.tcp_retries2": "5",
}
if OpenSearchDistribution.running_as_lxc():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to check sysctl anyway, it doesn't matter on which platform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I cannot apply all the options above for LXC; but I can if we have a VM or host.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to check the sysctl on all platforms after setting. It doesn't matter what platform or exit codes are received.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@delgod, so to clarify, it is better to keep the original behavior, where we check the settings and then, block if the user did not apply it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phvalguima I think the following flow can work in our case:

  1. check the current values of each sysctl setting (regardless of the platform)
  2. make the distinction between:
    • the settings that need to be equal to a value (tcp_retries2, or swappiness)
    • the settings that need to be greater than a value (max_map_count >= 262144 , file-max)
      • we should only set these if that's the case, otherwise we are decreasing the quality of the settings already existing in the environment
  3. This should result in a set of invalid settings.
  4. attempt to set these settings (all the invalid set or one at a time), and re-check the new values, so you can accurately add each in the missing_requirements if any:
    • I prefer the missing_requirements we had, as the message is more tailored towards the values that need to be set, not just that the setting could not be set

Copy link
Contributor

@Mehdi-Bendriss Mehdi-Bendriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Pedro - I have a few comments

lib/charms/opensearch/v0/opensearch_distro.py Outdated Show resolved Hide resolved
Comment on lines -29 to -41
Configure the system settings required by [OpenSearch](https://opensearch.org/docs/latest/install-and-configure/install-opensearch/index/),
we'll do that by creating and setting a [`cloudinit-userdata.yaml` file](https://juju.is/docs/olm/juju-model-config) on the model.
As well as setting some kernel settings on the host machine.
```
cat <<EOF > cloudinit-userdata.yaml
cloudinit-userdata: |
postruncmd:
- [ 'echo', 'vm.max_map_count=262144', '>>', '/etc/sysctl.conf' ]
- [ 'echo', 'vm.swappiness=0', '>>', '/etc/sysctl.conf' ]
- [ 'echo', 'net.ipv4.tcp_retries2=5', '>>', '/etc/sysctl.conf' ]
- [ 'echo', 'fs.file-max=1048576', '>>', '/etc/sysctl.conf' ]
- [ 'sysctl', '-p' ]
EOF
Copy link
Contributor

@Mehdi-Bendriss Mehdi-Bendriss Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we remove this? This will still set userdata on EC2 instances - where swappiness etc can be set directly on the VM. (on LXD it will only set tcp_retries)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we need it then?

Running within LXC

We cannot set vm.max_map_count, vm.swappiness and fs.file-max as they must be set on the underlying host. So, the cloud-init is useless there.

Running within VMs

We will set all these values via charm.

So, in which situation I need the cloud-init back?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am keeping the commands to set the configs in sysctl for the LXC case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on LXD:

  • tcp_retries can be set using userdata through the model.
    The rest will not be set, but will not fail.

on EC2 / VM / Baremetal:

  • all sysctl can be applied using userdata through the model
  • users may choose to set better values than the ones we suggest, e.g higher value of max_map_count than our minimum set by the charm.

On the code:

  • we ensure that the values are correct, (they could have been changed manually in machines), if they're not - we set them
  • sysctl calls can fail for whatever reason, having them on the userdata already is 1 additional layer of safety.

fs.file-max=1048576
EOT

sudo sysctl -p

juju model-config --file=./cloudinit-userdata.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above

lib/charms/opensearch/v0/opensearch_distro.py Outdated Show resolved Hide resolved
sysctl.Config(name).configure(config)
except sysctl.CommandError as e:
logger.error(f"Failed to apply sysctl config: {e}")
missing_requirements.append(str(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with a bulk set, This will result in the BlockedStatus message, as follows:

Unable to set params: vm.swappiness, vm.max_map_count. 

In which case we don't need missing_requirements

"vm.swappiness": "0",
"net.ipv4.tcp_retries2": "5",
}
if OpenSearchDistribution.running_as_lxc():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phvalguima I think the following flow can work in our case:

  1. check the current values of each sysctl setting (regardless of the platform)
  2. make the distinction between:
    • the settings that need to be equal to a value (tcp_retries2, or swappiness)
    • the settings that need to be greater than a value (max_map_count >= 262144 , file-max)
      • we should only set these if that's the case, otherwise we are decreasing the quality of the settings already existing in the environment
  3. This should result in a set of invalid settings.
  4. attempt to set these settings (all the invalid set or one at a time), and re-check the new values, so you can accurately add each in the missing_requirements if any:
    • I prefer the missing_requirements we had, as the message is more tailored towards the values that need to be set, not just that the setting could not be set

lib/charms/opensearch/v0/opensearch_distro.py Outdated Show resolved Hide resolved
@Mehdi-Bendriss Mehdi-Bendriss deleted the DPE-4859-apply-sysctl branch August 13, 2024 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expected sysctl parameters should be enforced by the charm
3 participants