Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: Increase resiliance for long-running clusters / optimize resource usage #259

Open
4 tasks done
ableischwitz opened this issue Feb 17, 2023 · 8 comments
Open
4 tasks done

Comments

@ableischwitz
Copy link
Contributor

ableischwitz commented Feb 17, 2023

The default configuration of libvirt-machines is not very specific on resource-usage and disk optimization.

When running a OpenShift cluster on overprovisioned disk, one will want to make sure that sparse-disks will release/trim freed blocks, otherwise the vms will get paused due to no space left on device.
Also there is no need to have a graphical system included on the vms, as serial output has some additional benefits like ability to scroll back to missed output.

Steps to be done:

  • remove graphical-device and vnc from the vm-definition
  • test "discard-mode: unlink" for OS-disks used by OCP
  • check if scheduled fstrim runs need to be enabled on the nodes
  • apply kubelet-config with tight imageGarbageCollection

This issue should be considered as a draft for optimizations in regards to limited resource usage on a single host.

@ableischwitz
Copy link
Contributor Author

vm.xml.j2 needs adjustments:

Change disk-driver options:

   <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/{{ vm_instance_name }}.qcow2'/>
      <target dev='vda' bus='virtio'/>
   </disk>

should become to:

   <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' discard='unmap' />
      <source file='/var/lib/libvirt/images/{{ vm_instance_name }}.qcow2'/>
      <target dev='vda' bus='virtio'/>
   </disk>

Change machine-type:

<os>
    <type arch="x86_64">hvm</type>
    <boot dev="hd"/>
  </os>

Change from old pc-i440fx-* to q35-* and also switch to UEFI instead of BIOS (would allow secure-boot later on).

  <os firmware="efi">
    <type arch="x86_64" machine="q35" >hvm</type>
    <boot dev="hd"/>
  </os>

  <devices>
    <controller type="pci" index="0" model="pcie-root"/>
    ...
  </devices>

Remove:

<graphics type="vnc" port="-1"/>

Add:

<video>
    <model type='none'>
</video>

After the system is set to Q35 and the driver-option to enable discarding, I was able to run fstrim on a node:

% oc debug node/compute-0.compute.local
...
sh-4.4# chroot /host
sh-4.4# fstrim / -v
/: 108.9 GiB (116894867456 bytes) trimmed

Also the size of the qcow image reported a smaller number:

# qemu-img info -U /var/lib/libvirt/images/ocp4-compute-3.qcow2 
image: /var/lib/libvirt/images/ocp4-compute-0.qcow2
file format: qcow2
virtual size: 120 GiB (128849018880 bytes)
disk size: 10.3 GiB
cluster_size: 65536
backing file: /var/lib/libvirt/images/rhcos-4.10.3-x86_64-qemu.x86_64.qcow2
backing file format: qcow2
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false

@rbo
Copy link
Contributor

rbo commented Feb 22, 2023

For testing purpose, I added your suggestion into branch libvirt-xml-improvements

@rbo
Copy link
Contributor

rbo commented Feb 23, 2023

Tested on RHEL8 and RHEL9, vnc/video change is really great.

@rbo
Copy link
Contributor

rbo commented Feb 23, 2023

Branch libvirt-xml-improvements merged into devel.

@ableischwitz
Copy link
Contributor Author

The following machineConfig would needed to be applied for reduced caching of images:

# from https://cloud.redhat.com/blog/image-garbage-collection-in-openshift
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: imgc-kubeconfig
spec:
  kubeletConfig:
    imageGCHighThresholdPercent: 66
    imageGCLowThresholdPercent: 50
    imageMinimumGCAge: "5m30s"
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: ""

@rbo
Copy link
Contributor

rbo commented Feb 25, 2023

@ableischwitz why not just use a smaller disk? What's the reason to reduce caching of images?

@ableischwitz
Copy link
Contributor Author

The reason is quite simple: a) the size of 120G is documented as minimum size for disks and b) the size of disks is limited on such setups. The image-cache is sized for setups which don't suffer from disk-limitations.

Reducing that size (60% of 120G is still quite large) allows to maintain slim vm-disks, while also being able to use more space in case it's needed.
In case we reduce the size of the vm-disks, there won't be any easy way to start larger (or rather hughe??) workload images.

@rbo
Copy link
Contributor

rbo commented Mar 6, 2023

I don't get it if you don't have enough size for 3x120Gb disk it then use smaller disks. You cannot grow either...

You would suggest adding this as post-install step, if you like to automate it you can write your own post-install add-on.

@rbo rbo mentioned this issue Apr 14, 2023
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants