RFE: Increase resiliance for long-running clusters / optimize resource usage #259

ableischwitz · 2023-02-17T10:13:09Z

The default configuration of libvirt-machines is not very specific on resource-usage and disk optimization.

When running a OpenShift cluster on overprovisioned disk, one will want to make sure that sparse-disks will release/trim freed blocks, otherwise the vms will get paused due to no space left on device.
Also there is no need to have a graphical system included on the vms, as serial output has some additional benefits like ability to scroll back to missed output.

Steps to be done:

remove graphical-device and vnc from the vm-definition
test "discard-mode: unlink" for OS-disks used by OCP
check if scheduled fstrim runs need to be enabled on the nodes
apply kubelet-config with tight imageGarbageCollection

This issue should be considered as a draft for optimizations in regards to limited resource usage on a single host.

ableischwitz · 2023-02-17T11:59:02Z

vm.xml.j2 needs adjustments:

Change disk-driver options:

   <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/{{ vm_instance_name }}.qcow2'/>
      <target dev='vda' bus='virtio'/>
   </disk>

should become to:

   <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' discard='unmap' />
      <source file='/var/lib/libvirt/images/{{ vm_instance_name }}.qcow2'/>
      <target dev='vda' bus='virtio'/>
   </disk>

Change machine-type:

<os>
    <type arch="x86_64">hvm</type>
    <boot dev="hd"/>
  </os>

Change from old pc-i440fx-* to q35-* and also switch to UEFI instead of BIOS (would allow secure-boot later on).

  <os firmware="efi">
    <type arch="x86_64" machine="q35" >hvm</type>
    <boot dev="hd"/>
  </os>

  <devices>
    <controller type="pci" index="0" model="pcie-root"/>
    ...
  </devices>

Remove:

<graphics type="vnc" port="-1"/>

Add:

<video>
    <model type='none'>
</video>

After the system is set to Q35 and the driver-option to enable discarding, I was able to run fstrim on a node:

% oc debug node/compute-0.compute.local
...
sh-4.4# chroot /host
sh-4.4# fstrim / -v
/: 108.9 GiB (116894867456 bytes) trimmed

Also the size of the qcow image reported a smaller number:

# qemu-img info -U /var/lib/libvirt/images/ocp4-compute-3.qcow2 
image: /var/lib/libvirt/images/ocp4-compute-0.qcow2
file format: qcow2
virtual size: 120 GiB (128849018880 bytes)
disk size: 10.3 GiB
cluster_size: 65536
backing file: /var/lib/libvirt/images/rhcos-4.10.3-x86_64-qemu.x86_64.qcow2
backing file format: qcow2
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false

rbo · 2023-02-22T20:25:46Z

For testing purpose, I added your suggestion into branch libvirt-xml-improvements

rbo · 2023-02-23T09:50:34Z

Tested on RHEL8 and RHEL9, vnc/video change is really great.

rbo · 2023-02-23T09:54:06Z

Branch libvirt-xml-improvements merged into devel.

ableischwitz · 2023-02-24T15:21:38Z

The following machineConfig would needed to be applied for reduced caching of images:

# from https://cloud.redhat.com/blog/image-garbage-collection-in-openshift
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: imgc-kubeconfig
spec:
  kubeletConfig:
    imageGCHighThresholdPercent: 66
    imageGCLowThresholdPercent: 50
    imageMinimumGCAge: "5m30s"
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: ""

rbo · 2023-02-25T10:14:44Z

@ableischwitz why not just use a smaller disk? What's the reason to reduce caching of images?

ableischwitz · 2023-02-25T15:55:33Z

The reason is quite simple: a) the size of 120G is documented as minimum size for disks and b) the size of disks is limited on such setups. The image-cache is sized for setups which don't suffer from disk-limitations.

Reducing that size (60% of 120G is still quite large) allows to maintain slim vm-disks, while also being able to use more space in case it's needed.
In case we reduce the size of the vm-disks, there won't be any easy way to start larger (or rather hughe??) workload images.

rbo · 2023-03-06T16:47:31Z

I don't get it if you don't have enough size for 3x120Gb disk it then use smaller disks. You cannot grow either...

You would suggest adding this as post-install step, if you like to automate it you can write your own post-install add-on.

rbo added a commit that referenced this issue Feb 22, 2023

Added some vm.xml improvements based on #259

8cdc158

rbo added a commit that referenced this issue Feb 23, 2023

Added some vm.xml improvements based on #259

07851f3

rbo mentioned this issue Apr 14, 2023

issue 264 #266

Merged

1 task

rbo added a commit that referenced this issue Apr 14, 2023

Added some vm.xml improvements based on #259

03cb8c2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFE: Increase resiliance for long-running clusters / optimize resource usage #259

RFE: Increase resiliance for long-running clusters / optimize resource usage #259

ableischwitz commented Feb 17, 2023 •

edited

Loading

ableischwitz commented Feb 17, 2023

rbo commented Feb 22, 2023

rbo commented Feb 23, 2023

rbo commented Feb 23, 2023

ableischwitz commented Feb 24, 2023

rbo commented Feb 25, 2023 •

edited

Loading

ableischwitz commented Feb 25, 2023

rbo commented Mar 6, 2023

RFE: Increase resiliance for long-running clusters / optimize resource usage #259

RFE: Increase resiliance for long-running clusters / optimize resource usage #259

Comments

ableischwitz commented Feb 17, 2023 • edited Loading

ableischwitz commented Feb 17, 2023

rbo commented Feb 22, 2023

rbo commented Feb 23, 2023

rbo commented Feb 23, 2023

ableischwitz commented Feb 24, 2023

rbo commented Feb 25, 2023 • edited Loading

ableischwitz commented Feb 25, 2023

rbo commented Mar 6, 2023

ableischwitz commented Feb 17, 2023 •

edited

Loading

rbo commented Feb 25, 2023 •

edited

Loading