-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for NVMe volumes #384
Comments
Hello, We are considering adding support for dynamic provisioning of local storage volumes in DOKS, however it likely will not be implemented in this CSI driver. The significant caveat to using node-local NVMe/SSD storage is that it is indeed node-local - we can't detach it from one node and attach it to another. This means it's really only useful for ephemeral purposes, since we expect nodes to be replaced in the course of normal cluster operations (e.g., due to health or for upgrade). If you're able to share, I'd be interested to hear more about your use-case for local storage. We can connect over email if you'd rather discuss privately. Thanks! cc @bikram20 |
That's great news!
Interesting, how it'd be exposed and mounted then?
We do understand this caveat. There are cases when it's fine, we want to run distributed Database on NVMe storage and distributed object store. Due to performance requirements we do want to use NVMes that DigitalOcean offers. In our case the applications are distributed meaning that a Node shutdown for say upgrades and is fine since other nodes will act as replicas, this is achieved via
Let's continue publicly in this issue, there are very little public discussions on this topic so I'd like to use this thread as an opportunity to add more information on using local NVMe drives with Kubernetes to internet :) |
We would add an additional StorageClass with a separate provisioner, potentially leveraging an existing project like the |
Sounds good! |
Submitted related issue on partitioning NVMe drives for DOKS nodes digitalocean/DOKS#27, basically we can't repartition NVMe drive right now.. |
This sort of provisioning is also useful for running your own database workloads on nodes if you need something with the local nVME performance. Yes, the storage is 'ephemeral', but that is something database management tools like zalando or stolon can take into account, especially when combined with things like pod disruption budgets. You can implement solutions for that need today by running self-managed k8s clusters alongside a managed one, but the administration workload also multiplies accordingly in that case. Managed DOKS as of 1.20 at least is almost there with the ability to run your so_1.5_* plan node pools. If you offered a way to allow a node pool to upgrade in-place, an operator needing to run a local datastore could run it entirely in managed DOKS. In my particular usecase, I have clients who need to run PostgreSQL services with custom extensions and replication patterns, so that disqualifies most managed SQL offerings as well, thus my interest in closing the feature gaps in managing ephemeral storage on cloud instances/droplets. |
Hm. Vultr has been doing NVMe for a while as default for their Managed Kubernetes solution. This is a big difference with no additional cost. |
@kallisti5 What kind of workloads are you looking to run on NVMe local storage? Would you be okay with ephemeral nodes? Nodes are recycled during release upgrade. |
@bikram20 Overall I'm trying to find a cost-effective way to leverage the standard DO instance sizes. Running a reliable ReadWriteMany storage model is pretty difficult at Digital Ocean. My solution was longhorn storage (https://longhorn.io) since it maintains and grooms RWX replicas between all of the kubernetes nodes directly (using the massive amount of wasted space on each k8s node pool droplet saving costs (the 4vcpu / 8GiB nodes have over 100GiB which will go unused for most people using do's csi)). it also automatically backs up data to s3. NVMe though would probably be the minimum requirement to maintain replicas within a reasonable timeframe. DO really needs a managed storage solution that can do RWX like Gluster or NFS. The workload itself is 300 GiB+ of software packages for Haiku (https://haiku-os.org) plus some other infrastructure. |
For others that are interested, a potential workaround is to mount file containers. Here's an example (original source): File Container YAML---
apiVersion: v1
kind: Namespace
metadata:
name: xfs-disk-setup
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: xfs-disk-setup
namespace: xfs-disk-setup
labels:
app: xfs-disk-setup
spec:
selector:
matchLabels:
app: xfs-disk-setup
template:
metadata:
labels:
app: xfs-disk-setup
spec:
tolerations:
- operator: Exists
containers:
- name: xfs-disk-setup
image: docker.io/scylladb/local-csi-driver:latest
imagePullPolicy: IfNotPresent
command:
- "/bin/bash"
- "-euExo"
- "pipefail"
- "-O"
- "inherit_errexit"
- "-c"
- |
img_path="/host/var/persistent-volumes/persistent-volume.img"
img_dir=$( dirname "${img_path}" )
mount_path="/host/mnt/persistent-volumes"
mkdir -p "${img_dir}"
if [[ ! -f "${img_path}" ]]; then
dd if=/dev/zero of="${img_path}" bs=1024 count=0 seek=10485760
fi
FS=$(blkid -o value -s TYPE "${img_path}" || true)
if [[ "${FS}" != "xfs" ]]; then
mkfs --type=xfs "${img_path}"
fi
mkdir -p "${mount_path}"
remount_opt=""
if mountpoint "${mount_path}"; then
remount_opt="remount,"
fi
mount -t xfs -o "${remount_opt}prjquota" "${img_path}" "${mount_path}"
sleep infinity
securityContext:
privileged: true
volumeMounts:
- name: hostfs
mountPath: /host
mountPropagation: Bidirectional
volumes:
- name: hostfs
hostPath:
path: / You can then use https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner or any other local volume "provisioner" like normal. The above DaemonSet creates a sparse file by default. To instead reserve the amount of space specified, try a syntax like I benchmarked this on
and here's the results:
The block storage benchmarks match what is currently listed on the Limits page (7500 IOPS * 8k blocksize = 60MB/s).
Not OP, but I'm interested in this for use with CloudNative-PG as an alternative to Managed Databases (we have different RPO requirements). For what it's worth, here's our rudimentary pgbench results on CloudNative-PG using the above local file container vs managed database:
|
Hi!
We're looking for an automated way to provision
PersistentVolumeClaim
s against locally mounted NVMe drives on DigitalOcean https://www.digitalocean.com/blog/introducing-storage-optimized-droplets-with-nvme-ssds/We've tried
local
StorageClass https://kubernetes.io/docs/concepts/storage/storage-classes/#local, it does work however it is not automated at all, unlike DO Block Storage in k8s:PerstistentVolume
sPersistentVolume
has to be constrained to a particular node withnodeAffinity
PersistentVolume
has to have capacity manually defined, however it does not act as a limit sinceNVMe
storage is mounted as root/
filesystem on Premium and Storage Optimized Droplets with NVMePersistentVolume
must have only one assosiatedPersistentVolumeClaim
otherwise Pods using it will not be scheduledWe're looking into CSI implementations like https://github.com/minio/direct-csi, however major blocker there is that it only works with additional (non-root
/
) disks, but DigitalOcean Premium droplets use NVMe drive as root/
.The question is: can you consider adding support for DigitalOcean NVMe drives to csi-digitalocean please? :)
Thanks!
The text was updated successfully, but these errors were encountered: