Introduction


  • Public cloud providers are companies that offer pay-as-you-go computing resources and services over the internet to multiple users or organizations.
  • Terraform is an open-source tool to provision and delete computing infrastructure.
  • Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications and their associated workflows across clusters of hosts..
  • Argo Workflows is an open-source tool for orchestrating sequential and parallel jobs on Kubernetes.

Persistent storage


  • Google Cloud Storage bucket can be used to store the output files.
  • The storage cost depends on the volume stored and for this type of processing is very small.
  • The download of big output files from the bucket can be costly.

Disk image


  • A secondary boot disk with the container image preloaded can speed up the workflow start.

Kubernetes cluster


  • Kubernetes clusters can be created with Terraform scripts.
  • kubectl is the tool to interact with the cluster.

Set up workflow


  • Once the cluster is up, you will first deploy the Argo Workflows services using kubectl.
  • You will submit and monitor the workflow with argo.
  • You can see the output in the bucket with gcloud commands or on Google Cloud Console Web UI.

Scale up