Docker pre-exercises

Introduction

Overview

Teaching: 5 min
Exercises: 0 min
Questions
  • What is Docker?

  • What is the point of these exercises?

Objectives
  • Learn about Docker and why we’re using it

Let’s learn about Docker and why we’re using it!

Regardless of what you encounter in this lesson, the definitive guide is any official documentation provided by Docker.

What is Docker?

From the Docker website

A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.

In short, Docker allows a user to work in a computing environment that has been frozen with respect to interdependent libraries and code and related tools. This means that you can use the same software that analysts were using 10 years ago (for example) without downloading all the relevant 10-year-old libraries. :)

What can I learn here?

As much as we’d like, we can’t give you a complete overview of Docker. However, we do hope to explain why we run Docker in the way we do so that you gain some understanding. More specifically, we’ll be showing you how to set up Docker for not just this workshop, but for interfacing with the CMS open data in general

Key Points

  • Docker is an implementation of a tool called a container that gives us a self-consistent computing environment

  • Docker is widely used these days in both industry and academic research

  • Docker is one way that you can interface with CMS data using the same computing tools as CMS collaborators


Installing Docker

Overview

Teaching: 10 min
Exercises: 15 min
Questions
  • What equipment do I need?

  • How do I install Docker?

  • How do I test my installation?

  • What are the main Docker concepts and commands I need to know?

Objectives
  • Install Docker on your machine

  • Understand the most basic concepts about images and containers

Installing Docker is relatively straightforward, particularly because of the excellent documentation they provide. Still you want to set aside some time to do it properly and test it out.

Installing

Go to the offical Docker site and their installation instructions to install Docker for your operating system.

We see no need to go beyond the documentation they provide so we leave it up to you to follow their installation procedure.

In the episodes of this lesson that follow, we assume that Windows users have WSL2 activated with a Linux bash shell (e.g. Ubuntu) and Docker Desktop installed. All commands indicated with “bash” are expected to be typed in this Linux shell.

Note that WSL2 can take around an hour to install.

If you are new to Linux bash shell, you should first follow the tutorial on shell environment. Make sure that you are familiar with the directory structure and that you know how to create and remove directories and how to create files and save them to a specific directory. Make also sure that you have an editor with which you are confortable. A common choice is VS Code, but you can use any other editor.

Testing

As you walk through their documentation, you will eventually come to a point where you will run a very simple test, usually involving their hello-world container.

You can find their documentation for this step here.

Testing their code can be summed up by the ability to run (without generating any errors) the following commands.

docker --version
docker run hello-world

Images and Containers

As it was mentioned above, there is ample documentation provided by Docker official sites. However, there are a couple of concepts that are crucial for the sake of using the container technology with CMS open data: container images and containers.

One can think of the container image as the main ingredients for preparing a dish, and the final dish as the container itself. You can prepare many dishes (containers) based on the same ingredients (container image). Images can exist without containers, whereas a container needs to run an image to exist. Therefore, containers are dependent on images and use them to construct a run-time environment and run an application.

The final dish, for us, is a container that can be thought of as an isolated machine (running on the host machine) with mostly its own operating system and the adequate software and run-time environment to process CMS open data.

Docker provides the ability to create, build and/or modify images, which can then be used to create containers. We will not use this aspect of the technology because, as you will see later, we will use an already-built and ready-to-use image in order to create our needed container.

Commands Cheatsheet

There are many Docker commands that can be executed for different tasks. However, the most useful for our purposes are the following. We will show some usage examples for some of these commands later. Feel free to explore other commands.

Key Points

  • For up-to-date details for installing Docker, the official documentation is the best bet.

  • Make sure you were able to download and run Docker’s hello-world example.

  • The concepts of image and container, plus the knowledge of certain Dockers commands, is all that is needed to start using CMS open data


Using Docker with the CMS open data

Overview

Teaching: Self-guided min
Exercises: 40 min
Questions
  • How do I use docker to effectively interface with the CMS open data?

  • What container images are available for my work with the CMS open data?

Objectives
  • Download the CMSSW open data docker image

  • Open your own CMSSW open data container and check that graphical windows open

  • Download the ROOT and python images and build your own container

  • Restart an existing container

  • Delete and rebuild containers

Overview

This exercise will walk you through setting up and familiarizing yourself with Docker, so that you can effectively use it to interface with the CMS open data. It is not meant to completely cover containers and everything you can do with Docker.

Three types of container images are provided: one with the CMS software (CMSSW) compatible with the released data, and two others with ROOT and python libraries needed in this workshop. The CMSSW container is mandatory if you want to access the CMS data in AOD and MiniAOD formats (you will learn about them later), as you will not be able to install CMSSW software on your own computer. The two others are provided to make setting up and using ROOT and/or python libraries easier for you for this tutorial, but if you wish, you can also install them on your computer.

All container images come with VNC for the graphical use interface. It opens directly in a browser window. Optionally, you can also connect to the VNC server of the container using a VNC viewer (VNC viewer (TigerVNC, RealVNC, TightVNC, OSX built-in VNC viewer, etc.) installed on your local machine, but only the browser option for which no additional tools are needed is described in these instructions. On native Linux, you can also use X11-forwarding.

For different CMSSW container images, some guidance can be found on the Open Data Portal introduction to Docker. In this tutorial, we will use the container image needed for the CMS open data from 2015. The use of graphical interfaces, such the graphics window from ROOT, depends on the operating system of your computer. Therefore, in the following, separate instructions are given for Windows WSL, Linux and MacOS.

Note that the container images are large (the compressed download size is 6.6GB for the CMSSW container, and of order of 1GB for the ROOT and python containers). Make sure that you make it in time to download them and work through the exercises before the workshop.

Download the docker image for CMSSW open data and start a container

The first time you start a container, a docker image file gets downloaded from an image registry. The CMSSW open data image is large (6.6GB) and it may take very long to download, depending on the speed of your internet connection. After the download, a container created from that image starts. The image download needs to be done only once. Afterwards, when starting a container, it will find the downloaded image on your computer, and it will be much faster.

The containers do not have modern editors and it is expected that you mount your working directory from the local computer into the container, and use your normal editor for editing the files. Note that all your compiling and executing still has to be done in the Docker container!

First, before you start up your container, create a local directory where you will be doing your code development. In the example below, it is called cms_open_data_work and it will live in the $HOME directory. You may choose a different location and a shorter directory name if you like. :)

Local machine

cd # This is to make sure I'm in my home directory
mkdir cms_open_data_work

Warning!

If you do not create the directory on your local computer before creating the container, the directory is created automatically but with the wrong user/group. When starting the container, you will get a message cannot make directory CMSSW_7_6_7 Permission denied. In that case, delete the directory with rm -rf cms_open_data_work/, and remove the failing container with docker rm <container-name> so that you can use the same name. In the following, we will use my_od as the container name. And then, remember to create the directory before creating the container!

Start the container following the instructions below depending on the operating system you are using.

We will use the docker run command to create the container (downloading the appropriate image if it is the first time) and start it right away.

docker run -it --name my_od --net=host --env="DISPLAY" -v $HOME/.Xauthority:/home/cmsusr/.Xauthority:rw  -v ${HOME}/cms_open_data_work:/code cmsopendata/cmssw_7_6_7-slc6_amd64_gcc493 /bin/bash
Setting up CMSSW_7_6_7
CMSSW should now be available.
This is a standalone image for CMSSW_7_6_7 slc6_amd64_gcc493.
(/code/CMSSW_7_6_7/src)

This is now a bash shell in the CMS open data environment in which you have access to a complete CMS software release that is appropriate for interfacing with the 2015 13 TeV datasets.

As there are rate limits for pulls from Docker Hub, you may get the following error message: docker: Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading.. In that case, try later (the limit is per 6 hours) or use the mirror gitlab-registry.cern.ch/cms-cloud/cmssw-docker-opendata/cmssw_7_6_7-slc6_amd64_gcc493 instead of cmsopendata/cmssw_7_6_7-slc6_amd64_gcc493.

Now let’s understand the options that were used for the docker run command.

  • First, the -it (or -i) option means to start the container in interactive mode. Essentially, it means that you will end up inside the running container.
  • We assign a name to the container using the --name switch, so that we can refer back to this environment and still access any files we created in there. You can, of course, choose a different name than my_od.
  • The --net=host switch will allow you to use the host network (Internet access) in the container.
  • The --env switch will forward the appropiate DISPLAY environmental variable from the host machine to the container so X11-forwarding (the ability to open graphical windows inside the container) can be achieved.
  • For X11-forwarding to be functional, your local $HOME/.Xauthority file needs to be mounted as the /home/cmsusr/.Xauthority file inside the container. We do this using the --volume (or -v) switch. Note that the colon (:) symbol separates the source and destination points for the mounting procedure. In addition, the rw tag is given (aslo separated by :) so it can be read and written if necessary.
  • With -v ${HOME}/cms_open_data_work:/code, the working directory cms_open_data_work that you created in your home directory is mounted with the -v option into the container's /code directory. This makes it possible to edit files in the CMSSW area of your container with your normal editor on your local computer.
  • cmsopendata/cmssw_7_6_7-slc6_amd64_gcc493 is the name of the image we will use. If no label is prepended, Docker assumes that it resides in Docker Hub, the official image repository of Docker.
  • Finally, the /bin/bash option will throw the container into a bash shell when running interactively.

For a more complete listing of options, see the official Docker documentation on the docker run command.

To test that X11-forwarding works, start the ROOT program by typing root in the container prompt. In ROOT prompts , type TBrowser t to open the ROOT graphical window. If the graphical window opens you are all set and you can exit from ROOT either by choosing the option from the TBrowser window or by typing .q in the ROOT prompt.

Make sure that you can copy instructions from a browser page to the container terminal. One thing you can try is Shift+Ctrl+V when pasting into your container terminal, rather than Ctrl-V. That sometimes will work. If not, you will see later in these instructions how to pass files from your local computer to the container.

Then type exit to leave the container.

If you find that X11 forwarding is not working and the ROOT graphical window does not open, try typing the following before starting your Docker container.

xhost local:root

If everything works fine, you are ready to continue with the lesson.

If you still have problems with X11 forwarding

In the case you are having problems with X11 forwarding, there is the option of using a VNC application installed in the container image:

docker run -it --name my_od -P -p 5901:5901  -p 6080:6080 -v ${HOME}/cms_open_data_work:/code cmsopendata/cmssw_7_6_7-slc6_amd64_gcc493:latest /bin/bash

This container has a VNC application installed to allow opening graphical windows on a remote machine (seen from the container, your own computer is a remote machine). Start the application with start_vnc from your container prompt. You will need to start it every time you use the container (if you want to open graphics windows).

start_vnc
xauth:  file /home/cmsusr/.Xauthority does not exist

New 'myvnc:1' desktop is 1df549a6f098:1

Starting applications specified in /home/cmsusr/.vnc/xstartup
Log file is /home/cmsusr/.vnc/1df549a6f098:1.log

[1] 144
VNC connection points:
        VNC viewer address: 127.0.0.1:5901
        HTTP access: http://127.0.0.1:6080/vnc.html
To kill the vncserver enter 'vncserver -kill :1'

Open the browser window in the http address given at the start message and connect with the default VNC password cms.cern. It shows an empty screen to start with and all graphics will pop up there.

To test, start ROOT by typing root in the container terminal prompt. In the ROOT prompt, type TBrowser t to open the ROOT graphical window. If the graphical window opens you are all set and you can exit from ROOT either by choosing the “Quit Root” option from Browser menu of the TBrowser window or by typing .q in the ROOT prompt.

Importantly, stop the VNC server before exiting the container. If you don’t do it, you will need to do some cleaning before being able to open the graphics window next time you use the same container. Do the following:

 stop_vnc
 exit

We will use the docker run command to create the container (downloading the appropriate image if it is the first time) and start it right away.

docker run -it --name my_od -P -p 5901:5901 -p 6080:6080 -v ${HOME}/cms_open_data_work:/code cmsopendata/cmssw_7_6_7-slc6_amd64_gcc493 /bin/bash
Setting up CMSSW_7_6_7
CMSSW should now be available.
This is a standalone image for CMSSW_7_6_7 slc6_amd64_gcc493.
(/code/CMSSW_7_6_7/src)

This is now a bash shell in the CMS open data environment in which you have access to a complete CMS software release that is appropriate for interfacing with the 2015 13 TeV datasets.

As there are rate limits for pulls from Docker Hub, you may get the following error message: docker: Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading.. In that case, try later (the limit is per 6 hours) or use the mirror gitlab-registry.cern.ch/cms-cloud/cmssw-docker-opendata/cmssw_7_6_7-slc6_amd64_gcc493 instead of cmsopendata/cmssw_7_6_7_vnc.

If the docker command exits without giving you the output above, see this post in the CERN Open Data forum (note in particular that the .wslconfig file that you need to add must not have a file extension, if Windows adds it automatically, rename the file).

Now let’s understand the options that were used for the docker run command.

  • First, the -it (or -i) option means to start the container in interactive mode. Essentially, it means that you will end up inside the running container.
  • We assign a name to the container using the --name switch, so that we can refer back to this environment and still access any files we created in there. You can, of course, choose a different name than my_od.
  • The options -P -p 5901:5901 -p 6080:6080 open/publish ports from the container to the local host, needed for the graphical windows
  • With -v ${HOME}/cms_open_data_work:/code, the working directory cms_open_data_work that you created in your home directory is mounted with the -v option into the container's /code directory. This makes it possible to edit files in the CMSSW area of your container with your normal editor on your local computer.
  • cmsopendata/cmssw_7_6_7-slc6_amd64_gcc493 is the name of the image we will use. If no label is prepended, Docker assumes that it resides in Docker Hub, the official image repository of Docker.
  • Finally, the /bin/bash option will throw the container into a bash shell when running interactively.

For a more complete listing of options, see the official Docker documentation on the docker run command.

Now, first make sure that you can copy instructions from a browser page to the container terminal. It works in the same manner as the local WSL linux terminal, i.e. you can usually copy from other sources with Ctrl+C and then paste into your container terminal with mouse right click. Copy from the terminal itself by selecting the text to be copied. If this does not work, you will see later in these instructions how to pass files from your local computer to the container.

This container has a VNC application installed to allow opening graphical windows on a remote machine (seen from the container, your own computer is a remote machine). Start the application with start_vnc from your container prompt. You will need to start it every time you use the container (if you want to open graphics windows).

 start_vnc
xauth:  file /home/cmsusr/.Xauthority does not exist

New 'myvnc:1' desktop is 1df549a6f098:1

Starting applications specified in /home/cmsusr/.vnc/xstartup
Log file is /home/cmsusr/.vnc/1df549a6f098:1.log

[1] 144
VNC connection points:
        VNC viewer address: 127.0.0.1:5901
        HTTP access: http://127.0.0.1:6080/vnc.html
To kill the vncserver enter 'vncserver -kill :1'

Open the browser window in the http address given at the start message and connect with the default VNC password cms.cern. It shows an empty screen to start with and all graphics will pop up there. If it does not open, it may be that the Windows firewall is blocking it. In that case, check these instructions.

To test, start ROOT by typing root in the container terminal prompt. In the ROOT prompt, type TBrowser t to open the ROOT graphical window. If the graphical window opens you are all set and you can exit from ROOT either by choosing the “Quit Root” option from Browser menu of the TBrowser window or by typing .q in the ROOT prompt.

Importantly, stop the VNC server before exiting the container. If you don’t do it, you will need to do some cleaning before being able to open the graphics window next time you use the same container. Do the following:

 stop_vnc
 exit

We will use the docker run command to create the container (downloading the appropriate image if it is the first time) and start it right away.

docker run -it --name my_od -P -p 5901:5901 -p 6080:6080 -v ${HOME}/cms_open_data_work:/code cmsopendata/cmssw_7_6_7-slc6_amd64_gcc493 /bin/bash
Setting up CMSSW_7_6_7
CMSSW should now be available.
This is a standalone image for CMSSW_7_6_7 slc6_amd64_gcc493.
(/code/CMSSW_7_6_7/src)

This is now a bash shell in the CMS open data environment in which you have access to a complete CMS software release that is appropriate for interfacing with the 2015 13 TeV datasets.

Problems have been reported running amd-based containers such as this on MacOS with M1 chip. Increasing the memory available to Docker may help. Please check the possible solutions in this post in the CERN Open Data forum. Note, however, that this may help you to open the container, but it is very likely that problems remain when you try to compile code and run jobs in it.

For the CMS open data workshop, we provide a temporary solution which gives a docker environment in browser. You can use it for the CMSSW container during the lessons, if needed. Note the following:

  • in the "Play with docker" terminal, after having created the working directory and before starting the container, change the permission of the working directory with chmod 777 cms_open_data_work
  • if you use the editor that comes with "Play with docker", the owner of the edited file needs to be changed back in the container with sudo chown $USER file-name
  • for the vnc in browser (see below), opens it by cliking "Open port", give 6080 and then add vnc.html in the URL of the tab that opens.
The other containers used in this workshop should run fine on MacOS with M1 chip.

As there are rate limits for pulls from Docker Hub, you may get the following error message: docker: Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading.. In that case, try later (the limit is per 6 hours) or use the mirror gitlab-registry.cern.ch/cms-cloud/cmssw-docker-opendata/cmssw_7_6_7-slc6_amd64_gcc493 instead of cmsopendata/cmssw_7_6_7-slc6_amd64_gcc493.

Now let’s understand the options that were used for the docker run command.

  • First, the -it (or -i) option means to start the container in interactive mode. Essentially, it means that you will end up inside the running container.
  • We assign a name to the container using the --name switch, so that we can refer back to this environment and still access any files we created in there. You can, of course, choose a different name than my_od.
  • The options -P -p 5901:5901 -p 6080:6080 open/publish ports from the container to the local host, needed for the graphical windows
  • With -v ${HOME}/cms_open_data_work:/code, the working directory cms_open_data_work that you created in your home directory is mounted with the -v option into the container's /code directory. This makes it possible to edit files in the CMSSW area of your container with your normal editor on your local computer.
  • cmsopendata/cmssw_7_6_7-slc6_amd64_gcc493 is the name of the image we will use. If no label is prepended, Doc
  • With -v ${HOME}/cms_open_data_work:/code, the working directory cms_open_data_work that you created in your home directory is mounted with the -v option into the container's /code directory. This makes it possible to edit files in the CMSSW area of your container with your normal editor on your local computer.
  • ker assumes that it resides in Docker Hub, the official image repository of Docker.
  • Finally, the /bin/bash option will throw the container into a bash shell when running interactively.

For a more complete listing of options, see the official Docker documentation on the docker run command.

This container has a VNC application installed to allow opening graphical windows on a remote machine (seen from the container, your own computer is a remote machine). Start the application with start_vnc from your container prompt. You will need to start it every time you use the container (if you want to open graphics windows).

start_vnc
xauth:  file /home/cmsusr/.Xauthority does not exist

New 'myvnc:1' desktop is 1df549a6f098:1

Starting applications specified in /home/cmsusr/.vnc/xstartup
Log file is /home/cmsusr/.vnc/1df549a6f098:1.log

[1] 144
VNC connection points:
        VNC viewer address: 127.0.0.1:5901
        HTTP access: http://127.0.0.1:6080/vnc.html
To kill the vncserver enter 'vncserver -kill :1'

Open the browser window in the http address given at the start message and connect with the default VNC password cms.cern. It shows an empty screen to start with and all graphics will pop up there.

To test, start ROOT by typing root in the container terminal prompt. In the ROOT prompt, type TBrowser t to open the ROOT graphical window. If the graphical window opens you are all set and you can exit from ROOT either by choosing the “Quit Root” option from Browser menu of the TBrowser window or by typing .q in the ROOT prompt.

Importantly, stop the VNC server before exiting the container. If you don’t do it, you will need to do some cleaning before being able to open the graphics window next time you use the same container. Do the following:

 stop_vnc
 exit

Download the docker images for ROOT and python tools and start container

Containers with ROOT and python libraries installed are provided for your convenience. These containers can be used in the C++, ROOT and python tools lesson and later on for your work with CMS open data.

ROOT container

ROOT is included in the CMSSW container, but it is an old version because it needs to be compatible with the environment needed to access CMS open data AOD and MiniAOD files. In this tutorial, and in your work with CMS open data, you will often work on data that have been derived from the AOD or MiniAOD files and are not tied to a specific ROOT version. Therefore, a container with more recent ROOT version is provided.

First, create a working directory on your local computer:

cd
mkdir cms_open_data_root

Then, download the ROOT container and start it with the docker run command.

If you are on native Linux and want to use X11-forwarding, use

docker run -it --name my_root --net=host --env="DISPLAY" -v $HOME/.Xauthority:/home/cmsusr/.Xauthority:rw -v ${HOME}/cms_open_data_root:/code gitlab-registry.cern.ch/cms-cloud/root-vnc:latest

On MacOS and Windows WSL2 (and on native Linux if you do not want to use X11-forwarding), use

docker run -it --name my_root -P -p 5901:5901 -p 6080:6080 -v ${HOME}/cms_open_data_root:/code gitlab-registry.cern.ch/cms-cloud/root-vnc:latest

This opens a bash shell where you can type your commands. Edit files in the cms_open_data_root directory on your local computer, but run the commands in the container.

For graphics, on native Linux, use X11-forwarding. On other systems, use VNC that is installed in the container and start the graphics windows with start_vnc. Open the browser window in the address given at the start message (http://127.0.0.1:6080/vnc.html) with the default VNC password is cms.cern. It shows an empty screen to start with and all graphics will pop up there.

Type exit to leave the container, and if you have started VNC, stop it first:

stop_vnc
exit

Python tools container

ROOT is not the only option for analysis of CMS open data. A container image is provided with all python libraries that will be needed in this tutorial.

First, create a working directory on your local computer:

cd
mkdir cms_open_data_python

Then, download the python container and start it with the docker run command.

If you are on native Linux and want to use X11-forwarding, use

docker run -it --name my_python -P -p 8888:8888 --net=host --env="DISPLAY" -v $HOME/.Xauthority:/home/cmsusr/.Xauthority:rw -v ${HOME}/cms_open_data_python:/code gitlab-registry.cern.ch/cms-cloud/python-vnc:latest

On MacOS and Windows WSL2 (and on native Linux if you do not want to use X11-forwarding), use

docker run -it --name my_python -P -p 5901:5901 -p 6080:6080 -p 8888:8888 -v ${HOME}/cms_open_data_python:/code gitlab-registry.cern.ch/cms-cloud/python-vnc:latest

This opens a bash shell where you can type your commands. Edit files in the cms_open_data_python directory on your local computer, but run the commands in the container.

You can run jupyter notebooks in this container by typing in the container prompt

jupyter-lab --ip=0.0.0.0 --no-browser

and opening the link in the message on your browser.

If you see Permission denied when you try to open a new notebook, you most likely forgot to create the local working directory before creating the container. In that case, the directory was created automatically but with the wrong user/group. Exit from the container with exit. Then remove the container, remove the working directory, create it again:

docker rm my_python
rm -rf cms_open_data_python
mkdir cms_open_data_python

and create the container again with the docker run ... command above.

For other graphics, on native Linux, use X11-forwarding. On other systems, use VNC that is installed in the container and start the graphics windows with start_vnc. Open the browser window in the address given at the start message (http://127.0.0.1:6080/vnc.html) with the default VNC password is cms.cern. It shows an empty screen to start with and all graphics will pop up there.

Type exit to leave the container, and if you have started VNC, stop it first:

stop_vnc
exit

Coming back to the same container

You can come back to the same container you’ve used earlier with the docker start ... command.

docker start -i my_od

Note that running the docker run ... command as before would create a new container from the image you’ve downloaded. This would be a new environment. However, as we are mounting the directory from the local computer into the container, you will see the files from your earlier container even in your new container. Most often, you do not want to create a new container but you would use the existing container to go to the same working area with all our files and code saved.

CHALLENGE! Test persistence

Go into the container and in the /code/CMSSW_7_6_7/src directory, create a test file using some simple shell commands. Type the following exactly as you see it. It will dump some text into a file and then print the contents of the file to the screen

echo "I am still here" > test.tmp
cat test.tmp

After you’ve done this, check if you see the file test.tmp in your local computer in the cms_open_data_work/CMSSW_7_6_7/src directory. If you did it correctly, you should be able to list the contents of the directory with ls -l and see your file from before! If not, check that you followed all the instructions above correctly or contact the facilitators.

Now, exit from the container and remove it with

docker rm my_od

Create a new container with the docker run command that you used in the first place. Check if you see the file that you created before.

Note that with the volume mount, your files will not disappear when you remove the container because they are stored in a directory on the local computer. If you really want to get rid of them, you will have to delete them either on the container or on your local computer.

You can make use of this, for example, when you have forgotten to stop VNC with stop_vnc when you exit the container. Probably the quickest way to clean is to remove and recreate the container. When you do it, the files that block the VNC from starting will be removed and as they are not located in the mounted directory, they will not be present when you create a new container. But the files in your working area (as test.tmp above) will be there again.

Stopping and removing containers

As you are learning how to use Docker, you may find yourself with multiple containers. Or maybe you started a container with your favourite name with some set of flags and now you want use that same name but with new flags. In that case, you will want to stop the container and remove it.

A container stops when you type the exit command in the container prompt. It may happen that you accidentally close the terminal where the container is running. In that case, the container will not stop and it will remain running. You can list the running containers with docker ps. You can either return to the container using its name (here “my_od”) with the start command on your local machine and then exit normally from the container prompt:

docker start -i my_od
exit

or stop the container with

docker stop my_od

To stop all running containers:

docker stop $(docker ps -q)

To remove the container “my_od”, you would type the following. Note that this will delete the container and all files, but the files in the CMSSW_7_6_7/src directory which is shared with your local computer will remain in your local computer’s directory.

docker rm my_od

To remove all containers:

docker rm $(docker ps -aq)

Don’t worry!

Note that these commands will not remove the actual Docker image that you downloaded and may have taken quite some time to download! Whew!

Key Points

  • You have now set up a docker container as a working enviroment for CMS open data. You know how to open a graphical window in it and how to pass files between your own computer and the container.


Test and validate the CMS open data environment

Overview

Teaching: 10 min
Exercises: 30 min
Questions
  • What is in the CMSSW Docker image?

  • How do I test and validate my CMSSW Docker container?

Objectives
  • Learn about the details of the CMS Docker container

  • Test and validate the CMS Docker image by running a CMSSW job.

The CMS open data containers

In the previous page, you have downloaded the three different containers that will be used in this tutorial: the CMSSW container, the root container and the python container. You’ve tested that you can open the graphical user interface. The CMSSW container is mandatory to access to the CMS open data files. Therefore, in this section, you’ll make sure that you can run a CMSSW job and access to the data files.

Know your Docker image

The Docker container we just created provides CMS computing environment to be used with the 2015 CMS open data. The Docker container uses Scientific Linux CERN. As it was mentioned before, it comes equipped with the ROOT framework and the version of CMS Software - CMSSW compatible with the CMS open data.

Access to the data is through the XRootD protocol.

The working directory

When your Docker container starts up with the volume mount option -v ${HOME}/cms_open_data_work:/home/cmsusr, everything that is in the container’s /home/cmsusr directory is also visible in your local computer’s ${HOME}/cms_open_data_work. You will therefore see CMSSW_7_6_7/src on your local computer, and you will be able to edit files there. The changes will take effect also on the files in the container.

Remember that whatever you have in the local directory cms_open_data_work will be visible in the container. If you need to create a new container, make sure that you pass a fresh directory, or that you are sure that the old files are those that you want to pass to your new container.

Warning!

If you did not create the directory on your local computer before creating the container, it is created automatically but with the wrong user/group. When starting the container, you will get a message cannot make directory CMSSW_7_6_7 Permission denied

Run a simple demo for testing and validating

The validation procedure tests that the CMS environment is installed and operational on your Docker container, and that you have access to the CMS Open Data files. These steps also give you a quick introduction to the CMS environment.

Verify first that you are in ~/CMSSW_7_6_7/src directory in your container. You can check that with command pwd.

Create a working directory for the demo analyzer, change to that directory and create a skeleton for the analyzer:

mkdir Demo
cd Demo
mkedanlzr DemoAnalyzer

Compile the code:

scram b

and you will get output similar to this:

$ scram b
>> Local Products Rules ..... started
>> Local Products Rules ..... done
>> Entering Package Demo/DemoAnalyzer
>> Creating project symlinks
  src/Demo/DemoAnalyzer/python -> python/Demo/DemoAnalyzer
Entering library rule at src/Demo/DemoAnalyzer/plugins
>> Compiling edm plugin /home/cmsusr/CMSSW_7_6_7/src/Demo/DemoAnalyzer/plugins/DemoAnalyzer.cc
>> Building edm plugin tmp/slc6_amd64_gcc493/src/Demo/DemoAnalyzer/plugins/DemoDemoAnalyzerAuto/libDemoDemoAnalyzerAuto.so
Leaving library rule at src/Demo/DemoAnalyzer/plugins
@@@@ Running edmWriteConfigs for DemoDemoAnalyzerAuto
--- Registered EDM Plugin: DemoDemoAnalyzerAuto
>> Leaving Package Demo/DemoAnalyzer
>> Package Demo/DemoAnalyzer built
>> Subsystem Demo built
>> Local Products Rules ..... started
>> Local Products Rules ..... done
gmake[1]: Entering directory `/code/CMSSW_7_6_7'
>> Creating project symlinks
  src/Demo/DemoAnalyzer/python -> python/Demo/DemoAnalyzer
>> Done python_symlink
>> Compiling python modules python
>> Compiling python modules src/Demo/DemoAnalyzer/python
>> All python modules compiled
@@@@ Refreshing Plugins:edmPluginRefresh
>> Pluging of all type refreshed.
>> Done generating edm plugin poisoned information
gmake[1]: Leaving directory `/code/CMSSW_7_6_7'

Before launching the job, let’s modify the configuration file (do not worry, you will learn about all this stuff in a different lesson) so that it will read a CMS open data file.

Open the ConfFile_cfg.py in the Demo/DemoAnalyzer/python directory with your normal editor on your local computer.You will find the Demo area under the cms_open_data_work/CMSSW_7_6_7/src directory on your local computer. As the working directory has been mounted into the container, all changes take effect there as well.

Replace file:myfile.root with root://eospublic.cern.ch//eos/opendata/cms/Run2015D/SingleElectron/MINIAOD/08Jun2016-v1/10000/001A703B-B52E-E611-BA13-0025905A60B6.root to point to an example file.

Chage also the maximum number of events to 10. I.e., change -1to 10 in process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(-1)).

Take a look at the final validation config file

At the end, the config file should look like

import FWCore.ParameterSet.Config as cms
process = cms.Process("Demo")
process.load("FWCore.MessageService.MessageLogger_cfi")
process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(10) )
process.source = cms.Source("PoolSource",
# replace 'myfile.root' with the source file you want to use
   fileNames = cms.untracked.vstring(
       'root://eospublic.cern.ch//eos/opendata/cms/Run2015D/SingleElectron/MINIAOD/08Jun2016-v1/10000/001A703B-B52E-E611-BA13-0025905A60B6.root'
   )
)

process.demo = cms.EDAnalyzer('DemoAnalyzer'
)

process.p = cms.Path(process.demo)

Finally, run the cms executable with our configuration:

cmsRun DemoAnalyzer/python/ConfFile_cfg.py
10-Jul-2022 18:44:42 CEST  Initiating request to open file root://eospublic.cern.ch//eos/opendata/cms/Run2015D/SingleElectron/MINIAOD/08Jun2016-v1/10000/001A703B-B52E-E611-BA13-0025905A60B6.root
220710 18:44:42 570 secgsi_InitProxy: cannot access private key file: /home/cmsusr/.globus/userkey.pem
%MSG-w XrdAdaptor:  file_open 10-Jul-2022 18:44:43 CEST pre-events
Data is served from cern.ch instead of original site eospublic
%MSG
10-Jul-2022 18:44:44 CEST  Successfully opened file root://eospublic.cern.ch//eos/opendata/cms/Run2015D/SingleElectron/MINIAOD/08Jun2016-v1/10000/001A703B-B52E-E611-BA13-0025905A60B6.root
Begin processing the 1st record. Run 257645, Event 1184198851, LumiSection 776 at 10-Jul-2022 18:44:59.914 CEST
Begin processing the 2nd record. Run 257645, Event 1184202760, LumiSection 776 at 10-Jul-2022 18:44:59.916 CEST
Begin processing the 3rd record. Run 257645, Event 1183968519, LumiSection 776 at 10-Jul-2022 18:44:59.917 CEST
Begin processing the 4th record. Run 257645, Event 1183964627, LumiSection 776 at 10-Jul-2022 18:44:59.917 CEST
Begin processing the 5th record. Run 257645, Event 1184761030, LumiSection 776 at 10-Jul-2022 18:44:59.918 CEST
Begin processing the 6th record. Run 257645, Event 1184269130, LumiSection 776 at 10-Jul-2022 18:44:59.918 CEST
Begin processing the 7th record. Run 257645, Event 1184358918, LumiSection 776 at 10-Jul-2022 18:44:59.918 CEST
Begin processing the 8th record. Run 257645, Event 1183874827, LumiSection 776 at 10-Jul-2022 18:44:59.919 CEST
Begin processing the 9th record. Run 257645, Event 1184415529, LumiSection 776 at 10-Jul-2022 18:44:59.919 CEST
Begin processing the 10th record. Run 257645, Event 1184425291, LumiSection 776 at 10-Jul-2022 18:44:59.919 CEST
10-Jul-2022 18:44:59 CEST  Closed file root://eospublic.cern.ch//eos/opendata/cms/Run2015D/SingleElectron/MINIAOD/08Jun2016-v1/10000/001A703B-B52E-E611-BA13-0025905A60B6.root

=============================================

MessageLogger Summary

 type     category        sev    module        subroutine        count    total
 ---- -------------------- -- ---------------- ----------------  -----    -----
    1 XrdAdaptor           -w file_open                              1        1
    2 fileAction           -s file_close                             1        1
    3 fileAction           -s file_open                              2        2

 type    category    Examples: run/evt        run/evt          run/evt
 ---- -------------------- ---------------- ---------------- ----------------
    1 XrdAdaptor           pre-events
    2 fileAction           PostEndRun
    3 fileAction           pre-events       pre-events

Severity    # Occurrences   Total Occurrences
--------    -------------   -----------------
Warning                 1                   1
System                  3

Congratulations! You are all set with your Docker environment.

Work assignment

Now, submit your assignment for this lesson. You will find a task in our assignment form; remember you must sign in and click on the submit button in order to save your work. You can go back to edit the form at any time.

Problems have been reported running amd-based containers such as this on MacOS with M1 chip. Increasing the memory available to Docker may help. Please check the possible solutions in this post in the CERN Open Data forum. Note, however, that this may help you to open the container, but it is very likely that problems remain when you try to compile code and run jobs in it.

For the CMS open data workshop, we provide a temporary solution which gives a docker environment in browser. You can use it for the CMSSW container during the lessons, if needed. Note the following:

  • in the "Play with docker" terminal, after having created the working directory and before starting the container, change the permission of the working directory with chmod 777 cms_open_data_work
  • if you use the editor that comes with "Play with docker", the owner of the edited file needs to be changed back in the container with sudo chown $USER file-name
  • for the vnc in browser, open it by cliking "Open port", give 6080 and then add vnc.html in the URL of the tab that opens.
The other containers used in this workshop should run fine on MacOS with M1 chip.

CMSSW jobs still not running in a container on a MacOS with M1 chip?

If increasing memory for Docker did not help, there’s not much we can do. But this is not a show-stopper, you can still work with the CMS Open data, but you have to work differently. You will not be able to run CMSSW jobs in the CMSSW open data container on your own laptop, but you can still use the container. We propose the following:

  • run the quick examples and tests as GitHub actions using the CMSSW container (in which case your jobs run on GitHub “runners”) and download the ouput files as “artifacts” (an example is coming soon)
  • for any larger production, you would in any case use other resources than your own laptop, you will learn more about that in the cloud tutorial
  • you can still use the two other containers (for ROOT and python) to inspect the output of your jobs.

To run a short CMSSW example job as a GitHub workflow, first go to the example repository. The repository contains the example code generated above, with the two modifications in the configuration file for the file name and the number of events. To get your own version of it click on the arrow to the right of Fork (top right), and choose “Create a new fork”.

In your new repository, go to the Actions tab, and click on “I understand my workflow, go ahead and enable them”. Choose the workflow “Test CMSSW on plain docker” and run the workflow by selecting branch docker-04 under “Run workflow”.

You can follow the job progress and output by clicking on “DemoAnalyzer test - plain docker” and expanding on “Going to a container”, and if the job finishes with success, you will find the ouput under “Artifacts” in the workflow summary. In this example, it is the ouput log from the job above, but you could eventually produce some data files later on during this workshop and download them from the same place.

Note that every time the workflow runs it takes several minutes to start, as it needs to download the container image. This certainly not ideal for quick testing, but remember that this is a workaround as you were not able to run jobs on the container locally.

The workflow is defined in .github/workflows/main.yaml and the commands that are passed into the container are in commands.sh in branch docker-04 of the repository.

Key Points

  • The CMS Docker image contains all the required ingredients to start analyzing CMS open data.

  • In order to test and validate the Docker container you can run a simple CMSSW job.