This lesson is in the early stages of development (Alpha version)

Kubectl and additional tools and services

Overview

Teaching: 20 min
Exercises: 0 min
Questions
  • What is kubectl?

  • What is Argo workflows?

  • What kind of services/resources will I need to instantiate in my cluster?

Objectives
  • Lear what the kubectl command can do

  • Appreciate the necessity for the Argo workflows tool (or similar)

  • Lear how to set up different services/resources to get the most of your cluster

The kubectl command

Just as gcloud is the one command to rule them all for the GCP, the kubectl command is the main tool for interacting with your K8s cluster. You will use it to do essentially anything in the cluster. Here is the official cheatsheet, which is very useful but already very long.

Let’s run a few examples.

kubectl get nodes  
kubectl cluster-info  # Display addresses of the master and services

Let’s list some kubernetes components:

kubectl get pod
kubectl get services

We don’t have much going on. Let’s create some components.

kubectl create -h

Note there is no pod on the list, so in K8s you don’t create pods but deployments. These will create pods, which will run under the hood.

kubectl create deployment mynginx-depl --image=nginx

The nginx image will be pulled down from the Docker Hub. This is the most minimalist way of creating a deployment.

kubectl get deployment
kubectl get pod

Yaml Files

Another way of creating components in a K8s cluster is trough yaml files. These are intuitive, logical and configurable, although very picky about identation.

Let’s take a look at one of these files, nginx-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.16
        ports:
        - containerPort: 8080

Let’s delete our previous deployment and deploy using the yaml file:

kubectl delete deployment mynginx-depl
kubectl apply -f nginx-deployment.yaml

Namespaces

Namespaces are a kind of reservations in your K8s cluster. Let’s create one for the Argo workflow we will user

kubectl create ns <NAMESPACE>

Argo

While jobs can also be run manually, a workflow engine makes defining and submitting jobs easier. In this tutorial, we use argo quick start page to install it:

kubectl create clusterrolebinding YOURNAME-cluster-admin-binding --clusterrole=cluster-admin --user=YOUREMAIL@gmail.com
kubectl create ns argo
kubectl apply -n argo -f https://raw.githubusercontent.com/argoproj/argo-workflows/stable/manifests/quick-start-postgres.yaml

Download argo CLI:

# Download the binary
curl -sLO https://github.com/argoproj/argo-workflows/releases/download/v3.1.2/argo-linux-amd64.gz

# Unzip
gunzip argo-linux-amd64.gz

# Make binary executable
chmod +x argo-linux-amd64

# Move binary to path
sudo mv ./argo-linux-amd64 /usr/local/bin/argo

# Test installation
argo version

Run a simple test flow:

argo submit -n argo --watch https://raw.githubusercontent.com/argoproj/argo-workflows/master/examples/hello-world.yaml
argo list -n argo
argo get -n argo @latest
argo logs -n argo @latest

Storage volumes

If we run some application or workflow, we usually require a disk space where to dump our results. There is no persistent disk by default, we have to create it.

You could create a disk clicking on the web interface above, but lets do it faster in the command line.

Create the volume (disk) we are going to use

gcloud compute disks create --size=100GB --zone=us-central1-c gce-nfs-disk-1

Set up an nfs server for this disk:

wget https://cms-opendata-workshop.github.io/workshop2021-lesson-cloud/files/001-nfs-server.yaml
kubectl apply -n argo -f 001-nfs-server.yaml

Set up a nfs service, so we can access the server:

wget https://cms-opendata-workshop.github.io/workshop2021-lesson-cloud/files/002-nfs-server-service.yaml
kubectl apply -n argo -f 002-nfs-server-service.yaml

Let’s find out the IP of the nfs server:

kubectl get -n argo svc nfs-server |grep ClusterIP | awk '{ print $3; }'

Let’s create a persisten volume out of this nfs disk. Note that persisten volumes are not namespaced they are available to the whole cluster.

We need to write that IP number above into the appropriate place in this file:

wget https://cms-opendata-workshop.github.io/workshop2021-lesson-cloud/files/003-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-1
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: <Add IP here>
    path: "/"

Deploy:

kubectl apply -f 003-pv.yaml

Check:

kubectl get pv

Apps can claim persistent volumes through persistent volume claims (pvc). Let’s create a pvc:

wget https://cms-opendata-workshop.github.io/workshop2021-lesson-cloud/files/003-pvc.yaml
kubectl apply -n argo -f 003-pvc.yaml

Check:

kubectl get pvc -n argo

Now an argo workflow coul claim and access this volume with a configuration like:

# argo-wf-volume.ysml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: test-hostpath-
spec:
  entrypoint: test-hostpath
  volumes:
    - name: task-pv-storage
      persistentVolumeClaim:
        claimName: nfs-<NUMBER>
  templates:
  - name: test-hostpath
    script:
      image: alpine:latest
      command: [sh]
      source: |
        echo "This is the ouput" > /mnt/vol/test.txt
        echo ls -l /mnt/vol: `ls -l /mnt/vol`
      volumeMounts:
      - name: task-pv-storage
        mountPath: /mnt/vol

Now the workflow could access the disk. However is rather cumbersome to get the ouput outside the cluster. One of he best ways to do that is to follow similar procedures to set up an http web server for example.

As you can see all these tasks take time. The good news is that there are tools to make the cluster creation and the setting up of volumes, pvs, pvcs, servers, services, etc., automatic. This is what we will be doing for you (using Terraform) for the upcoming hands-on part of this lesson.

You will continue learning about this, hands-on, tomorrow, but concentrating on analysis workflows.

Key Points

  • kubectl is the ruler of GKE

  • Argo is a very useful tool for running workflows and parallel jobs

  • To be able to write, read and extract data, a few services/resources need to be set up on the GCP