CMS Open Data using Kubernetes: Glossary

Key Points

Introduction
  • Kubernetes is a powerful tool to schedule containers.

Getting started with Google Kubernetes Engine
  • With Kubernetes one can run workflows similar to a batch system

Storing workflow output on Google Kubernetes Engine
  • CMS Open Data workflows can be run in a commercial cloud environment using modern tools

Downloading data using the cernopendata-client
  • It is usually of advantage to have the data where to CPU cores are.

Running large-scale workflows
  • Argo Workflows on Kubernetes are very powerful.

Building a Docker image on GCP
  • GCP allows you to store your Docker images in your own private container registry.

Getting real
  • Argo is a powerful tool for running parallel workflows

Cleaning up
  • The cluster and disks should be deleted if not needed anymore.

Glossary

FIXME