Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not resize Luks volume ? #812

Closed
albundy83 opened this issue Jul 29, 2024 · 30 comments
Closed

Could not resize Luks volume ? #812

albundy83 opened this issue Jul 29, 2024 · 30 comments

Comments

@albundy83
Copy link
Contributor

albundy83 commented Jul 29, 2024

/kind bug

What happened?
Hello, I have tried to increase PVC volume size but it just fails :(
Here the error message :

MountVolume.Setup failed while expanding volume for volume "pvc-xxxxxxxxxxxxxxx" : Expander.NodeExpand failed to expand the volume : rpc error: code = Internal desc = Could not resize Luks volume "vol-xxxxx": exit status 1

If I can provide you more details, just tell me :-)

What you expected to happen?
Could not resize Luks volumeolume resized

How to reproduce it (as minimally and precisely as possible)?
Here the storageclass I have used (I have tried with both ext4 and xfs):

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp2-delete-immediate-encrypted
allowVolumeExpansion: true
parameters:
  encrypted: 'true'
  luks-cipher: aes-xts-plain64
  csi.storage.k8s.io/node-stage-secret-name: luks-key
  csi.storage.k8s.io/node-stage-secret-namespace: kube-system
  csi.storage.k8s.io/provisioner-secret-name: osc-csi-bsu
  csi.storage.k8s.io/provisioner-secret-namespace: kube-system
  csi.storage.k8s.io/fstype: xfs
  type: gp2
provisioner: bsu.csi.outscale.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp2-delete-immediate-encrypted-ext4
allowVolumeExpansion: true
parameters:
  encrypted: 'true'
  luks-cipher: aes-xts-plain64
  csi.storage.k8s.io/node-stage-secret-name: luks-key
  csi.storage.k8s.io/node-stage-secret-namespace: kube-system
  csi.storage.k8s.io/provisioner-secret-name: osc-csi-bsu
  csi.storage.k8s.io/provisioner-secret-namespace: kube-system
  csi.storage.k8s.io/fstype: ext4
  type: gp2
provisioner: bsu.csi.outscale.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

Anything else we need to know?:

Environment

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.15", GitCommit:"fb63712e1d017142977e88a23644b8e48b775665", GitTreeState:"clean", BuildDate:"2024-06-11T20:04:38Z", GoVersion:"go1.21.11", Compiler:"gc", Platform:"linux/amd64"}
    Kustomize Version: v5.0.1
    Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.15+rke2r1", GitCommit:"fb63712e1d017142977e88a23644b8e48b775665", GitTreeState:"clean", BuildDate:"2024-06-19T05:07:04Z", GoVersion:"go1.21.11 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}
- Driver version: v1.3.0
@albundy83
Copy link
Contributor Author

It seems that it's a feature that has been implemented in earlier release.
Any chance to increase the external-resizer ?

@Bizyroth
Copy link

Bizyroth commented Sep 5, 2024

Hello, i had the same problem but i solved it. Here how i did:

Prerequisite

  • Be sure that your pv have a reclaim policy to "Retain", otherwise you will loose your data
  • Ensure that your pv have a claimref to your pvc

How to do

  • Scale down your pod
  • Edit your pvc to increase the storage size and wait a little
  • Copy your pvc conf
  • Delete your pvc
  • Go to your pv, one must be in Release state
  • Edit the pv and delete in the claimRef section resourceVersion and id field
  • Deploy your pvc with the same name has previously (it should reclaim the pv)
  • Scale up your pod and it should work

Little analysis

I think that the increasing size of the volume goes well. After increasing the pvc size, i manually mount the outscale disk on a VM and everything was ok. But when i launched the pod i had the same error has you with Luks encryption.

Hope this help you

@albundy83
Copy link
Contributor Author

Well, it should work without all this steps.
It's juste that csi driver must be updated

@outscale-hmi
Copy link
Contributor

outscale-hmi commented Sep 12, 2024

#814 PR already done

@albundy83
Copy link
Contributor Author

Hello,

Will you release soon?

@albundy83
Copy link
Contributor Author

Hello,

any update for the new release ?

@albundy83
Copy link
Contributor Author

Hello,

just try with new release, I could not even mount luks volume :(
Rollback to previous release and mouting luks was working.

@albundy83
Copy link
Contributor Author

I have used same storageclass that described at the beginning of this post.
I have tried both with ext4 and xfs.
I have this error message:

MountVolume.MountDevice failed for volume "pvc-32b71011-4904-4c64-898d-408157f40d81" : rpc error: code = Internal desc = error while formating luks partition to vol-d9b51046, err: <nil>

@albundy83
Copy link
Contributor Author

And in debug mode:

I1018 19:47:44.261569       1 node.go:573] NodeGetVolumeStatsResponse: {Usage:[available:212492316672 total:214643507200 used:2151190528 unit:BYTES  available:104857520 total:104857600 used:80 unit:INODES ] VolumeCondition:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1018 19:47:49.613078       1 identity.go:62] Probe: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1018 19:47:59.613025       1 identity.go:62] Probe: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1018 19:47:59.668147       1 node.go:609] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1018 19:47:59.672857       1 node.go:609] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1018 19:47:59.673473       1 node.go:609] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1018 19:47:59.674141       1 node.go:94] NodeStageVolume: called with args  {VolumeId:vol-d9b51046, PublishContext:map[devicePath:/dev/xvdh encrypted:true luks-cipher:aes-xts-plain64 luks-hash: luks-key-size: storage.kubernetes.io/csiProvisionerIdentity:1729280526855-7513-bsu.csi.outscale.com], StagingTargetPath:/var/lib/kubelet/plugins/kubernetes.io/csi/bsu.csi.outscale.com/c33e944394a41dd53794d2f08ad773ac49782991d5cd5e8a39c689d9a7f18abb/globalmount, VolumeCapability:mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > , VolumeContext:map[encrypted:true luks-cipher:aes-xts-plain64 luks-hash: luks-key-size: storage.kubernetes.io/csiProvisionerIdentity:1729280526855-7513-bsu.csi.outscale.com]}
I1018 19:47:59.674232       1 node.go:158] NodeStageVolume: find device path /dev/xvdh -> /dev/xvdh
I1018 19:47:59.674910       1 node.go:193] NodeStageVolume: The device must be encrypted
I1018 19:47:59.677363       1 node.go:210] NodeStageVolume: The device  does not have a luks format
I1018 19:48:00.464797       1 node.go:609] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1018 19:48:00.465753       1 node.go:515] NodeGetVolumeStats: called with args {VolumeId:vol-4945b969 VolumePath:/var/lib/kubelet/pods/2ce90b83-b98d-4ec6-919d-e9548901ccd1/volumes/kubernetes.io~csi/pvc-4570814f-36ca-48d1-8612-96a3e15ad1f6/mount StagingTargetPath: XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1018 19:48:00.465794       1 node.go:537] isBlockDevice false, <nil>
I1018 19:48:00.465901       1 node.go:573] NodeGetVolumeStatsResponse: {Usage:[available:10612416512 total:10726932480 used:114515968 unit:BYTES  available:5242807 total:5242880 used:73 unit:INODES ] VolumeCondition:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1018 19:48:00.998848       1 node.go:143] NodeStageVolume: volume="vol-d9b51046" operation finished
I1018 19:48:00.998963       1 node.go:145] donedone
E1018 19:48:00.998978       1 driver.go:113] GRPC error: rpc error: code = Internal desc = error while formating luks partition to vol-d9b51046, err: <nil> / (<nil>)
I1018 19:48:09.613381       1 identity.go:62] Probe: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}

@outscale-hmi
Copy link
Contributor

Hello,
which version did you try (is it 0.4.0 or 1.4.0) ?
To resize luks volumes, you should use the 1.4.0:

Also, As part of this release, we have successfully upgraded the external resizer component from v1.3.0 to v1.11.2 but this is not enough to support resizing volumes while they are in use (attached to running instances or pods). Currently, volumes must be in an "available" state, meaning they need to be detached or not actively used by a running process to resize them successfully.

@albundy83
Copy link
Contributor Author

Hello,

I use version 1.4.0.
I remember that volume (with or without luks) must be in "available" state to be resized as it's an Outscale api limitation.
But, as I explain you, I can't even mount volume now.
I have also tried with the previous 1.3.0 release and mount of the luks volume works perfectly.
It was to be sure that it's a new issue with the current release.
The specific error message is as shown in previous message is:

I1018 19:47:59.674232       1 node.go:158] NodeStageVolume: find device path /dev/xvdh -> /dev/xvdh
I1018 19:47:59.674910       1 node.go:193] NodeStageVolume: The device must be encrypted
I1018 19:47:59.677363       1 node.go:210] NodeStageVolume: The device  does not have a luks format

@outscale-hmi
Copy link
Contributor

outscale-hmi commented Oct 21, 2024

Ok thanks for your fast reply, I tested with your sc.yaml and I got this:

{"Volumes":[{"VolumeId":"vol-3ad55aeb","Tags":[{"Value":"pvc-40392a8a-761a-4e68-89af-ee7cd9b3ed00","Key":"CSIVolumeName"}],"VolumeType":"gp2","SubregionName":"eu-west-2a","State":"in-use","CreationDate":"2024-10-19T14:04:50.502Z","Iops":100,"LinkedVolumes":[{"VolumeId":"vol-3ad55aeb","DeleteOnVmDeletion":false,"DeviceName":"/dev/xvdb","State":"attached","VmId":"i-777c3253"}],"Size":1}],"ResponseContext":{"RequestId":"983e9a18-289d-435f-9c05-95d94765679f"}}
I1019 14:05:49.331240 1 cloud.go:954] Debug response ReadVolumes: response({NextPageToken: ResponseContext:0xc000132d48 Volumes:0xc000586900}), err()
I1019 14:05:49.331263 1 cloud.go:1160] Check Volume state before resizing volume: &{CreationDate:0xc00035c420 Iops:0xc000bbedd0 LinkedVolumes:0xc000586948 Size:0xc000bbedf8 SnapshotId: State:0xc00035c3b0 SubregionName:0xc00035c360 Tags:0xc000586930 VolumeId:0xc00035c2c0 VolumeType:0xc00035c310} err:
E1019 14:05:49.331290 1 driver.go:113] GRPC error: rpc error: code = Internal desc = Could not resize volume "vol-3ad55aeb": could not modify Outscale volume in non 'available' state: &{0xc00035c420 0xc000bbedd0 0xc000586948 0xc000bbedf8 0xc00035c3b0 0xc00035c360 0xc000586930 0xc00035c2c0 0xc00035c310} / ()

But I will look again and will be back to you ASAP

@albundy83
Copy link
Contributor Author

But this error message indicate you are trying to resize volume which is currently attached to a Pod no ?
Have you try to create a Volume with the following StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp2-delete-immediate-encrypted-ext4
allowVolumeExpansion: true
parameters:
  encrypted: 'true'
  luks-cipher: aes-xts-plain64
  csi.storage.k8s.io/node-stage-secret-name: luks-key
  csi.storage.k8s.io/node-stage-secret-namespace: kube-system
  csi.storage.k8s.io/provisioner-secret-name: osc-csi-bsu
  csi.storage.k8s.io/provisioner-secret-namespace: kube-system
  csi.storage.k8s.io/fstype: ext4
  type: gp2
provisioner: bsu.csi.outscale.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

and by creating the following Secrets:

apiVersion: v1
kind: Secret
metadata:
  name: luks-key
type: Opaque
stringData:
  key: xxxxxxxxxxxxx # Your luks key
---
apiVersion: v1
kind: Secret
metadata:
  name: osc-csi-bsu
  namespace: kube-system
  labels:
    app.kubernetes.io/instance: osc-bsu-csi-driver
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: osc-bsu-csi-driver
    app.kubernetes.io/version: v1.2.4
    helm.sh/chart: osc-bsu-csi-driver-1.6.0
  annotations:
    meta.helm.sh/release-name: osc-bsu-csi-driver
    meta.helm.sh/release-namespace: kube-system
stringData:
  access_key: xxxxxxxxxxxxxxxxxxxxx
  secret_key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
type: Opaque

@outscale-hmi
Copy link
Contributor

outscale-hmi commented Oct 22, 2024

Hello again,
The problem you are facing is related to this PR #828
we have made several key improvements to the securityContext settings in containers. These updates include not only managing user privileges but also adding a seccomp profile to better restrict system calls that could expose the system to potential threats.

Key Improvements:

Running containers as a non-root user: This reduces the risk of privilege escalation and limits what the container can do within the host system.

Seccomp Profile (RuntimeDefault): The seccomp profile restricts the system calls that containers can make, preventing potentially dangerous or unnecessary interactions with the host operating system.

AllowPrivilegeEscalation: false:
* This setting could be a potential issue. Operations like mounting encrypted volumes often require privileged access to interact with system components like the kernel, device drivers, and encryption tools (e.g., LUKS).
* If the CSI driver or container needs to elevate privileges temporarily to perform operations like mounting or managing encrypted devices, this setting will block those operations. For example, cryptsetup (which is used for managing LUKS-encrypted volumes) might require privilege escalation.

ReadOnlyRootFilesystem:
* Setting the root filesystem to read-only is generally a good security practice, but it’s unlikely to interfere with volume mounting. The mount operation occurs outside the root filesystem (e.g., on /mnt or another directory).

Consider Privileged Mode: If the CSI driver or container needs access to device drivers, encryption tools, or the kernel, you may need to run the container in privileged mode. You can enable this by setting privileged: true in the container securityContext.

Example:
containerSecurityContext:
privileged: true
readOnlyRootFilesystem: true
allowPrivilegeEscalation: true

Start by allowing privilege escalation (allowPrivilegeEscalation: true) to see if that resolves the issue. If not, try temporarily enabling privileged: true to diagnose whether the issue is related to insufficient permissions.

if privileged access proves necessary, you can then decide on how to minimize the use of privileged mode in the long run by refining the security settings or applying it selectively.

(If the problem persist can you please open a new ticket)

@albundy83
Copy link
Contributor Author

Hello,

I clearly understand the non-root user benefit.
But, there are 9 containers split between DaemonSet and Deployment and modification at ServiceAccount level.
Can't you help me a bit to find which containers must be privileged or not ?

@outscale-hmi
Copy link
Contributor

Hello,

DaemonSet (Node-specific operations):

  • CSI Node Driver: Manages mounting/unmounting volumes on nodes. Requires privileged access.
  • Node Driver Registrar: Registers the CSI driver with Kubelet, no privileged access required.
  • CSI Liveness Probe: Monitors the health of the CSI node, no privileged access.
    Deployment (Control plane operations):
  • CSI Provisioner: Handles volume provisioning.
  • CSI Attacher: Manages volume attachment/detachment.
  • CSI Resizer: Manages volume resizing.
  • CSI Snapshotter: Manages snapshots.

Only the Node Driver in the DaemonSet typically requires privileged: true (where the actual mount occurs), you need to allow privileged access by setting allowPrivilegeEscalation: true or allowing the node container to run as privileged (privileged: true).
because if the controller container only manages higher-level operations (provisioning, snapshots), it typically doesn’t require elevated permissions.

# DaemonSet
securityContext:
  privileged: true
  runAsUser: 0
  fsGroup: 0
  allowPrivilegeEscalation: true

# Deployment (Provisioner, Attacher, etc.)
securityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true

This ensures that only the Node Driver has privileged access, while other components operate with minimal privileges.

@albundy83
Copy link
Contributor Author

Ah thanks a lot for your explanation :)
I will try again !!

@albundy83
Copy link
Contributor Author

Hello,

not sure to understand this : #835 ?
And by the way, maybe those lines

# securityContext on the controller container (see sidecars for securityContext on sidecar containers)
containerSecurityContext:
seccompProfile:
type: RuntimeDefault
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
are not at the right place ?

@outscale-hmi
Copy link
Contributor

outscale-hmi commented Oct 23, 2024

Hello
Yes it's in the right place, because I added securityContext on node level, controller and sideCars
for Nodes, because it's often requires more privileged access to system resources like block devices and encryption tools (e.g., cryptsetup for LUKS).
for controllers because they typically interacts with APIs and orchestrates storage operations, but it might not need the same low-level system access as the node.
And, each sidecar might need specific permissions based on its function, but sidecars usually handle more API-level operations than direct interactions with the host system.
It's mainly to avoid over permissions or not enough and to apply the principle of Least Privilege.

#835 I pushed this PR to help you set the right permissions to handle encrypted volumes
The securityContext changes, such as allowing privilege escalation, setting privileged: true, and disabling seccomp restrictions, should be applied to the controller pod to ensure it has the necessary permissions and access to handle LUKS encryption and decryption.
The best practice is to review and reintroduce security settings in a more controlled manner, such as switching back to seccompProfile: RuntimeDefault or creating a custom seccomp profile.

@albundy83
Copy link
Contributor Author

Maybe you should check another time, as it's under serviceAccount and now, with latest commit, there is serviceAccount twice.
Here: https://github.com/outscale/osc-bsu-csi-driver/blob/master/osc-bsu-csi-driver/values.yaml#L227-L237

serviceAccount:
serviceAccount:
  controller:
    # -- Annotations to add to the Controller ServiceAccount
    annotations: {}
    # securityContext on the controller container (see sidecars for securityContext on sidecar containers)
    containerSecurityContext:
      seccompProfile:
        type: RuntimeDefault
      readOnlyRootFilesystem: true
      allowPrivilegeEscalation: false

@albundy83
Copy link
Contributor Author

albundy83 commented Oct 23, 2024

With your latest commits, I have successfully mounted luks volume (securitycontext issue).
But now, I'm back at the issue where luks can't be resized:

E1023 14:33:59.102443       1 driver.go:113] GRPC error: rpc error: code = Internal desc = Could not resize Luks volume "vol-6aef2e57": exit status 1 / (<nil>)
E1023 14:33:59.664693       1 driver.go:113] GRPC error: rpc error: code = Internal desc = Could not resize Luks volume "vol-6aef2e57": exit status 1 / (<nil>)
E1023 14:34:00.775943       1 driver.go:113] GRPC error: rpc error: code = Internal desc = Could not resize Luks volume "vol-6aef2e57": exit status 1 / (<nil>)
E1023 14:34:02.900746       1 driver.go:113] GRPC error: rpc error: code = Internal desc = Could not resize Luks volume "vol-6aef2e57": exit status 1 / (<nil>)
E1023 14:34:06.932201       1 driver.go:113] GRPC error: rpc error: code = Internal desc = Could not resize Luks volume "vol-6aef2e57": exit status 1 / (<nil>)
E1023 14:34:15.020141       1 driver.go:113] GRPC error: rpc error: code = Internal desc = Could not resize Luks volume "vol-6aef2e57": exit status 1 / (<nil>)
E1023 14:34:31.064020       1 driver.go:113] GRPC error: rpc error: code = Internal desc = Could not resize Luks volume "vol-6aef2e57": exit status 1 / (<nil>)

@albundy83
Copy link
Contributor Author

I have finally found the issue I think.
If you have time to check #837

Thanks a lor for your patience :)

@outscale-hmi
Copy link
Contributor

Hello
I will check this PR,
And you are right, I removed the duplicate securityContext and removed the misplacement of containerSecurityContext

So to summarize Seccomp Profile and Privilege Escalation:

For the node, the containerSecurityContext includes privileged: true and seccompProfile: Unconfined, which is essential if you're interacting with encrypted LUKS volumes and need full access to system resources.
For sidecars, the seccompProfile remains RuntimeDefault to provide a secure environment while disallowing privilege escalation where it is not needed.

@albundy83
Copy link
Contributor Author

It's safe and clear, thanks again for the clarification 😊

@albundy83
Copy link
Contributor Author

Hello,
@outscale-hmi any chance to improve my pull request ?
We are close to fix this properly :)

@outscale-hmi
Copy link
Contributor

Hello @albundy83 sorry I was off, I will work in this this week.

@outscale-hmi
Copy link
Contributor

PR done to fix the resize #839

@albundy83
Copy link
Contributor Author

Ah perfect, will you release a version or should we try it like this ?

@outscale-hmi
Copy link
Contributor

yes I will release soon for sure, but you can test if you want

@albundy83
Copy link
Contributor Author

It works !!!!!!
I have tried with ext4 and xfs, creation and resize.
Thanks a lot for this great update :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants