Scale up
Last updated on 2024-10-21 | Edit this page
Estimated time: 15 minutes
Overview
Questions
- How to process a full dataset?
- What is an optimal cluster setup?
- What is an optimal job configuration?
Objectives
- Optimize the cluster setup for a full dataset processing.
- Learn about job configuration.
Resource needs
Create a small cluster and connect
If you do not have the cluster from the previous sections, create a
new one. Set the number of nodes gke_num_nodes
to 2 in the
terraform.tfvars
file and create the resources, run
and confirm “yes”.
Connect to it with
Run a test job
Run a small test as in the previous section, but set the number of step to one. While the job is running, observe the resources usage with
The big processing step - with runpfnano
in the name -
defines the resource needs. The output indicates how much CPU (in units
of 1/1000 of a CPU) and memory the process consumes. Follow the resource
consumption during the job to see how it evolves.
The job configuration, i.e. how many jobs will be submitted in a node, is defined by the resouce requests in the workflow definition.
In this example, we have defined them as
resources:
requests:
cpu: "780m"
memory: "1.8Gi"
ephemeral-storage: "5Gi"
and the main constraint here is the CPU. This request will guarantee that only 1 jobs will run on a CPU.
Cluster configuration
Input data
The optimal cluster configuration depends on the input dataset. Datasets consist of files, and the number of files can vary. In practical terms, the input to the parallel processing steps is a list of files. Dividing events from input files to different processing steps can be done, but would require a filtering list as an input to the processing.
In an ideal case, the parallel steps should take the same amount of time to complete. However, this is usually not the case because
- input files are not equal in size
- processing time per events can vary.
However, the best approach is to have the same amount of files in each parallel step. Eventually, the files could be sorted according to their size and their share to the nodes could be optimized.
In the example case, we have used the MuonEG MiniAOD dataset with 353 input files. For that number of files, we deployed a cluster with 90 nodes 4-vCPU nodes, providing a total of 360 vCPUs. We can therefore define a workflow with 353 parallel jobs and have a close to full occupation of the cluster.
The relevant Terraform input variables fur such cluster are
project_id = "<PROJECT_ID>"
region = "europe-west4"
gke_num_nodes = 30
Note that as the “zone” (a, b or c in the location name) is not defined, the cluster will have 30 nodes in each zone, in total 90.
Alternatively, a smaller cluster would be less expensive per unit time but the processing takes a longer. In the benchmarking, a large cluster was found to be practical.
GCP sets quotas to resources, and you can either increase them - to a certain limit - or request a quota increase.
You will notice when you go beyond the predefined quota from this type of message during the cluster creation:
[...]
Error: error creating NodePool: googleapi: Error 403:
Error waiting for creating GKE cluster: Insufficient quota to satisfy the request:
If that happens, go to the Quotas & System Limits page at the Google Cloud Console. The link is also in the error message. Search for quotas that you need to increase.
For a cluster with a big number of nodes, you must increase the quotas for “CPUs” and “In-use regional external IPv4 addresses”.
Once you find the quota line, click on the three vertical dots and choose “Edit quota”.
If you can’t increase them to desired value, submit a quota increase request through this form. You will receive an email with increase request approved (or rarely denied if the location is down in resources). It is usually immediate, but takes some minutes to propagate.
Autoscaling
The Terraform script gke.tf
has the autoscaling
activated. This makes the cluster scale up or down according to
resources in use. This reduces the cost in particular for a cluster with
a big amount of nodes. It often happens that some jobs get longer than
the other, and in that case the cluster lifetime (and the cost) is
defined by the longest job. Autoscaling removes the nodes once they do
not have active processes running.
Costs
Cluster management fee
For the GKE “Standard” cluster, there’s a cluster management fee of $0.10 per hour.