Storing a workflow output on Kubernetes
Overview
Teaching: 5 min
Exercises: 30 minQuestions
How to set up a common storage on a cloud cluster
Objectives
Understand how to set up shared storage and use it in a workflow
Kubernetes Cluster - Storage Volume
When running applications or workflows, it is often necessary to allocate disk space to store the resulting data. Kubernetes provides various options for managing storage volumes, allowing you to efficiently store and access data within your cluster.
In many applications or workflows, there is a need for disk space to store and persist results. While Google Kubernetes Engine (GKE) provides persistent disks as default storage for applications running in the cluster, local machines do not have such built-in persistent storage capabilities. To address this, we can create a persistent volume using an NFS disk, allowing us to store and access data from our local machines. It’s important to note that persistent volumes are not namespaced and are available to the entire Kubernetes cluster.
Create the volume (disk) we are going to use:
gcloud compute disks create --size=100GB --zone=europe-west1-b gce-nfs-disk-1
Set up an nfs server for this disk:
wget https://cms-opendata-workshop.github.io/workshop2023-lesson-introcloud/files/GKE/001-nfs-server.yaml
kubectl apply -n argo -f 001-nfs-server.yaml
Set up a nfs service, so we can access the server:
wget https://cms-opendata-workshop.github.io/workshop2023-lesson-introcloud/files/GKE/002-nfs-server-service.yaml
kubectl apply -n argo -f 002-nfs-server-service.yaml
Let’s find out the IP of the nfs server:
kubectl get -n argo svc nfs-server | grep ClusterIP | awk '{ print $3; }'
Let’s create a persistent volume out of this nfs disk. Note that persistent volumes are not namespaced they are available to the whole cluster.
We need to write that IP number above into the appropriate place in this file:
wget https://cms-opendata-workshop.github.io/workshop2023-lesson-introcloud/files/GKE/003-pv.yaml
vim 003-pv.yaml
Deploy:
kubectl apply -f 003-pv.yaml
Check:
kubectl get pv
Apps can claim persistent volumes through persistent volume claims (pvc). Let’s create a pvc:
wget https://cms-opendata-workshop.github.io/workshop2023-lesson-introcloud/files/GKE/003-pvc.yaml
kubectl apply -n argo -f 003-pvc.yaml
Check:
kubectl get pvc -n argo
Access the disk through a pod
To get files from the persistent volume, we can define a container, a “storage pod” and mount the volume there so that we can get access to it.
Get the pv-pod.yaml file with:
wget https://cms-opendata-workshop.github.io/workshop2023-lesson-introcloud/files/GKE/pv-pod.yaml
This file has the following content:
YAML File
# pv-pod.yaml apiVersion: v1 kind: Pod metadata: name: pv-pod spec: volumes: - name: task-pv-storage persistentVolumeClaim: claimName: nfs-1 containers: - name: pv-container image: busybox command: ["tail", "-f", "/dev/null"] volumeMounts: - mountPath: /mnt/data name: task-pv-storage
Create the storage pod and check the contents of the persistent volume
kubectl apply -f pv-pod.yaml -n argo
kubectl exec pv-pod -n argo -- ls /mnt/data
Once you have files in the volume, you could use
kubectl cp pv-pod:/mnt/data /tmp/poddata -n argo
to get them to a local /tmp/poddata
directory.
Key Points
Persistant volume is used to store the workflow output