Introduction
- Public cloud providers are companies that offer pay-as-you-go computing resources and services over the internet to multiple users or organizations.
- Terraform is an open-source tool to provision and delete computing infrastructure.
- Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications and their associated workflows across clusters of hosts.
- Argo Workflows is an open-source tool for orchestrating sequential and parallel jobs on Kubernetes.
Persistent storage
- Google Cloud Storage bucket can be used to store the output files.
- The storage cost depends on the volume stored and the duration of storage, and for this type of processing is very small.
- The download of big output files from the bucket can be costly.
Disk image
- A secondary boot disk with the container image preloaded can speed up the workflow start.
Kubernetes cluster
- Kubernetes clusters can be created with Terraform scripts.
-
kubectl
is the tool to interact with the cluster.
Set up workflow
- Once the cluster is up, you will first deploy the Argo Workflows
services using
kubectl
. - You will submit and monitor the workflow with
argo
. - You can see the output in the bucket with
gcloud
commands or on Google Cloud Console Web UI.
Scale up
- The resource request should be set so that one job runs in one vCPU.
- Basic kubectl command
kubetcl top pods
can be used to inspect the resource consumption during a test job. - The optimal number of nodes in a cluster depends on the number of files in the dataset, and it should be chosen so that each job has the same number of files.
- A large cluster running for a short time was found to be the most convenient.
- Autoscaling can reduce the cost as it shuts down and deletes the nodes when all jobs on the node have finished.
Discussion
- Technically, deploying the resources, setting up the workflow and running the processing on Google Cloud Platform was very smooth.
- Final testing on a new Google Cloud account revealed problems with resource quota increase requests, i.e. the willingness of Google support to help small customers.