CMS Open Data using Kubernetes

A physics analysis usually encompasses running over hundreds of gigabytes of data. At CMS, this is usually performed using high-throughput batch systems such as the HTCondor installation at CERN and at other research institutions as well as the worldwide LHC computing grid (WLCG). Not everyone will have these resources available at their own institution, but nowadays anyone can get access to computing resources via public cloud vendors. This lesson will give you a first taste of running realistic physics analyses “in the cloud” using Kubernetes (as well as giving you a brief introduction to Kubernetes itself).

Prerequisites

We expect you to have followed the largest part of the CMS Open Data Workshop for Theorists lessons. In particular, you should be familiar with Docker by now.

Schedule

Setup Download files required for the lesson
00:00 1. Introduction What is Kubernetes?
Why would I use Kubernetes?
00:30 2. Getting started with Google Kubernetes Engine How to create a cluster on Google Cloud Platform?
How to setup Google Kubernetes Engine?
How to setup a workflow engine to submit jobs?
How to run a simple job?
01:00 3. Storing workflow output on Google Kubernetes Engine How can I set up shared storage for my workflows?
How to run a simple job and get the the ouput?
How to run a basic CMS Open Data workflow and get the output files?
01:30 4. Downloading data using the cernopendata-client How can I download data from the Open Data portal?
Should I stream the data from CERN or have them available locally?
01:55 5. Running large-scale workflows How can I run more than a toy workflow?
02:15 6. Building a Docker image on GCP How do I push to a private registry on GCP?
02:55 7. Getting real How can I now use this for real?
03:20 8. Cleaning up How can I delete my cluster and disks?
03:25 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.