Introduction


  • Public cloud providers are companies that offer pay-as-you-go computing resources and services over the internet to multiple users or organizations.
  • Terraform is an open-source tool to provision and delete computing infrastructure.
  • Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications and their associated workflows across clusters of hosts.
  • Argo Workflows is an open-source tool for orchestrating sequential and parallel jobs on Kubernetes.

Persistent storage


  • Google Cloud Storage bucket can be used to store the output files.
  • The storage cost depends on the volume stored and the duration of storage, and for this type of processing is very small.
  • The download of big output files from the bucket can be costly.

Disk image


  • A secondary boot disk with the container image preloaded can speed up the workflow start.

Kubernetes cluster


  • Kubernetes clusters can be created with Terraform scripts.
  • kubectl is the tool to interact with the cluster.

Set up workflow


  • Once the cluster is up, you will first deploy the Argo Workflows services using kubectl.
  • You will submit and monitor the workflow with argo.
  • You can see the output in the bucket with gcloud commands or on Google Cloud Console Web UI.

Scale up


  • The resource request should be set so that one job runs in one vCPU.
  • Basic kubectl command kubetcl top pods can be used to inspect the resource consumption during a test job.
  • The optimal number of nodes in a cluster depends on the number of files in the dataset, and it should be chosen so that each job has the same number of files.
  • A large cluster running for a short time was found to be the most convenient.
  • Autoscaling can reduce the cost as it shuts down and deletes the nodes when all jobs on the node have finished.

Discussion


  • Technically, deploying the resources, setting up the workflow and running the processing on Google Cloud Platform was very smooth.
  • Final testing on a new Google Cloud account revealed problems with resource quota increase requests, i.e. the willingness of Google support to help small customers.