Using Docker with the CMS open data

Overview

Teaching: Self-guided min
Exercises: 40 min
Questions
  • How do I use docker to effectively interface with the CMS open data?

  • What container images are available for my work with the CMS open data?

Objectives
  • Download the CMSSW open data docker image

  • Open your own CMSSW open data container and check that graphical windows open

  • Download the ROOT and python images and build your own container

  • Restart an existing container

  • Delete and rebuild containers

Overview

This exercise will walk you through setting up and familiarizing yourself with Docker, so that you can effectively use it to interface with the CMS open data. It is not meant to completely cover containers and everything you can do with Docker.

Three types of container images are provided: one with the CMS software (CMSSW) compatible with the released data, and two others with ROOT and python libraries needed in this workshop. The CMSSW container is mandatory if you want to access the CMS data in AOD and MiniAOD formats (you will learn about them later), as you will not be able to install CMSSW software on your own computer. The two others are provided to make setting up and using ROOT and/or python libraries easier for you for this tutorial, but if you wish, you can also install them on your computer.

All container images come with VNC for the graphical use interface. It opens directly in a browser window. Optionally, you can also connect to the VNC server of the container using a VNC viewer (VNC viewer (TigerVNC, RealVNC, TightVNC, OSX built-in VNC viewer, etc.) installed on your local machine, but only the browser option for which no additional tools are needed is described in these instructions. On native Linux, you can also use X11-forwarding.

For different CMSSW container images, some guidance can be found on the Open Data Portal introduction to Docker. In this tutorial, we will use the container image needed for the CMS open data from 2015. The use of graphical interfaces, such the graphics window from ROOT, depends on the operating system of your computer. Therefore, in the following, separate instructions are given for Windows WSL, Linux and MacOS.

Note that the container images are large (the compressed download size is 6.6GB for the CMSSW container, and of order of 1GB for the ROOT and python containers). Make sure that you make it in time to download them and work through the exercises before the workshop.

Download the docker images for ROOT and python tools and start container

Containers with ROOT and python libraries installed are provided for your convenience. These containers can be used in the C++, ROOT and python tools lesson and later on for your work with CMS open data.

ROOT container

The first time you start a container, a docker image file gets downloaded from an image registry. In this tutorial, and in your work with CMS open data, you will often work on data that have been derived from the standard CMS AOD" orMiniAOD” files and are not tied to a specific ROOT version. Therefore, a container with recent ROOT version is provided.

First, create a working directory on your local computer:

cd
mkdir cms_open_data_root
chmod -R 777 cms_open_data_root

Then, download the ROOT container and start it with the docker run command.

If you are on native Linux and want to use X11-forwarding, use

docker run -it --name my_root --net=host --env="DISPLAY" -v $HOME/.Xauthority:/home/cmsusr/.Xauthority:rw -v ${HOME}/cms_open_data_root:/code gitlab-registry.cern.ch/cms-cloud/root-vnc:latest

On MacOS and Windows WSL2 (and on native Linux if you do not want to use X11-forwarding), use

docker run -it --name my_root -P -p 5901:5901 -p 6080:6080 -v ${HOME}/cms_open_data_root:/code gitlab-registry.cern.ch/cms-cloud/root-vnc:latest

This opens a bash shell where you can type your commands. Edit files in the cms_open_data_root directory on your local computer, but run the commands in the container.

For graphics, on native Linux, use X11-forwarding. On other systems, use VNC that is installed in the container and start the graphics windows with start_vnc. Open the browser window in the address given at the start message (http://127.0.0.1:6080/vnc.html) with the default VNC password is cms.cern. It shows an empty screen to start with and all graphics will pop up there.

Type exit to leave the container, and if you have started VNC, stop it first:

stop_vnc
exit

Python tools container

ROOT is not the only option for analysis of CMS open data. A container image is provided with all python libraries that will be needed in this tutorial.

First, create a working directory on your local computer:

cd
mkdir cms_open_data_python
chmod -R 777 cms_open_data_python

Then, download the python container and start it with the docker run command.

If you are on native Linux and want to use X11-forwarding, use

docker run -it --name my_python -P -p 8888:8888 --net=host --env="DISPLAY" -v $HOME/.Xauthority:/home/cmsusr/.Xauthority:rw -v ${HOME}/cms_open_data_python:/code gitlab-registry.cern.ch/cms-cloud/python-vnc:python3.10.5

On MacOS and Windows WSL2 (and on native Linux if you do not want to use X11-forwarding), use

docker run -it --name my_python -P -p 5901:5901 -p 6080:6080 -p 8888:8888 -v ${HOME}/cms_open_data_python:/code gitlab-registry.cern.ch/cms-cloud/python-vnc:python3.10.5

This opens a bash shell where you can type your commands. Edit files in the cms_open_data_python directory on your local computer, but run the commands in the container.

You can run jupyter notebooks in this container by typing in the container prompt

jupyter-lab --ip=0.0.0.0 --no-browser

and opening the link in the message on your browser.

If you see Permission denied when you try to open a new notebook, you most likely forgot to create the local working directory before creating the container. In that case, the directory was created automatically but with the wrong user/group. Exit from the container with exit. Then remove the container, remove the working directory, create it again:

docker rm my_python
rm -rf cms_open_data_python
mkdir cms_open_data_python
chmod -R 777 cms_open_data_python

and create the container again with the docker run ... command above.

For other graphics, on native Linux, use X11-forwarding. On other systems, use VNC that is installed in the container and start the graphics windows with vnc_start. Open the browser window in the address given at the start message (http://127.0.0.1:6080/vnc.html) with the default VNC password is cms.cern. It shows an empty screen to start with and all graphics will pop up there.

Type exit to leave the container, and if you have started VNC, stop it first:

stop_vnc
exit

Download the docker image for CMSSW open data and start a container

Mac users with Apple M1/M2 chips: stop here!

The CMSSW software in this container cannot be compiled for the new architecture of the M1/M2 chip. We do not recommend using up storage space on your device by downloading the image.

Don’t worry! :)

This will not prevent you from engaging in the vast majority of the workshop.

  • For the workshop, much of the data will already have been processed and you can then analyze it with the ROOT or Python container.
  • For large-scale analysis workflows, condor or cloud computing tasks can be run directly in docker containers on non-Mac worker nodes.

The CMSSW open data image is large (6.6GB) and it may take very long to download, depending on the speed of your internet connection. After the download, a container created from that image starts. The image download needs to be done only once. Afterwards, when starting a container, it will find the downloaded image on your computer, and it will be much faster.

The containers do not have modern editors and it is expected that you mount your working directory from the local computer into the container, and use your normal editor for editing the files. Note that all your compiling and executing still has to be done in the Docker container! ROOT is included in the CMSSW container, but it is an old version because it needs to be compatible with the environment needed to access CMS open data AOD and MiniAOD files.

First, before you start up your container, create a local directory where you will be doing your code development. In the example below, it is called cms_open_data_work and it will live in the $HOME directory. You may choose a different location and a shorter directory name if you like. :)

Local machine

# Make sure I'm in my home directory
cd 

# Create the directory
mkdir cms_open_data_work

# Change the permissions on the directory to give everyone access
chmod -R 777 cms_open_data_work

If you forget to do the last step of changing permissions with the chmod command, you may find that your docker container will not start in the next step.

Warning!

If you do not create the directory on your local computer before creating the container, the directory is created automatically but with the wrong user/group. When starting the container, you will get a message cannot make directory CMSSW_7_6_7 Permission denied. In that case, delete the directory with rm -rf cms_open_data_work/, and remove the failing container with docker rm <container-name> so that you can use the same name. In the following, we will use my_od as the container name. And then, remember to create the directory before creating the container!

Start the container following the instructions below depending on the operating system you are using.

We will use the docker run command to create the container (downloading the appropriate image if it is the first time) and start it right away.

docker run -it --name my_od --net=host --env="DISPLAY" -v $HOME/.Xauthority:/home/cmsusr/.Xauthority:rw  -v ${HOME}/cms_open_data_work:/code cmsopendata/cmssw_7_6_7-slc6_amd64_gcc493 /bin/bash
Setting up CMSSW_7_6_7
CMSSW should now be available.
This is a standalone image for CMSSW_7_6_7 slc6_amd64_gcc493.
(/code/CMSSW_7_6_7/src)

This is now a bash shell in the CMS open data environment in which you have access to a complete CMS software release that is appropriate for interfacing with the 2015 13 TeV datasets.

As there are rate limits for pulls from Docker Hub, you may get the following error message: docker: Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading.. In that case, try later (the limit is per 6 hours) or use the mirror gitlab-registry.cern.ch/cms-cloud/cmssw-docker-opendata/cmssw_7_6_7-slc6_amd64_gcc493 instead of cmsopendata/cmssw_7_6_7-slc6_amd64_gcc493.

Now let’s understand the options that were used for the docker run command.

  • First, the -it (or -i) option means to start the container in interactive mode. Essentially, it means that you will end up inside the running container.
  • We assign a name to the container using the --name switch, so that we can refer back to this environment and still access any files we created in there. You can, of course, choose a different name than my_od.
  • The --net=host switch will allow you to use the host network (Internet access) in the container.
  • The --env switch will forward the appropiate DISPLAY environmental variable from the host machine to the container so X11-forwarding (the ability to open graphical windows inside the container) can be achieved.
  • For X11-forwarding to be functional, your local $HOME/.Xauthority file needs to be mounted as the /home/cmsusr/.Xauthority file inside the container. We do this using the --volume (or -v) switch. Note that the colon (:) symbol separates the source and destination points for the mounting procedure. In addition, the rw tag is given (aslo separated by :) so it can be read and written if necessary.
  • With -v ${HOME}/cms_open_data_work:/code, the working directory cms_open_data_work that you created in your home directory is mounted with the -v option into the container's /code directory. This makes it possible to edit files in the CMSSW area of your container with your normal editor on your local computer.
  • cmsopendata/cmssw_7_6_7-slc6_amd64_gcc493 is the name of the image we will use. If no label is prepended, Docker assumes that it resides in Docker Hub, the official image repository of Docker.
  • Finally, the /bin/bash option will throw the container into a bash shell when running interactively.

For a more complete listing of options, see the official Docker documentation on the docker run command.

To test that X11-forwarding works, start the ROOT program by typing root in the container prompt. In ROOT prompts , type TBrowser t to open the ROOT graphical window. If the graphical window opens you are all set and you can exit from ROOT either by choosing the option from the TBrowser window or by typing .q in the ROOT prompt.

Make sure that you can copy instructions from a browser page to the container terminal. One thing you can try is Shift+Ctrl+V when pasting into your container terminal, rather than Ctrl-V. That sometimes will work. If not, you will see later in these instructions how to pass files from your local computer to the container.

Then type exit to leave the container.

If you find that X11 forwarding is not working and the ROOT graphical window does not open, try typing the following before starting your Docker container.

xhost local:root

If everything works fine, you are ready to continue with the lesson.

If you still have problems with X11 forwarding

In the case you are having problems with X11 forwarding, there is the option of using a VNC application installed in the container image:

docker run -it --name my_od -P -p 5901:5901  -p 6080:6080 -v ${HOME}/cms_open_data_work:/code cmsopendata/cmssw_7_6_7-slc6_amd64_gcc493:latest /bin/bash

This container has a VNC application installed to allow opening graphical windows on a remote machine (seen from the container, your own computer is a remote machine). Start the application with start_vnc from your container prompt. You will need to start it every time you use the container (if you want to open graphics windows).

start_vnc
xauth:  file /home/cmsusr/.Xauthority does not exist

New 'myvnc:1' desktop is 1df549a6f098:1

Starting applications specified in /home/cmsusr/.vnc/xstartup
Log file is /home/cmsusr/.vnc/1df549a6f098:1.log

[1] 144
VNC connection points:
        VNC viewer address: 127.0.0.1:5901
        HTTP access: http://127.0.0.1:6080/vnc.html
To kill the vncserver enter 'vncserver -kill :1'

Open the browser window in the http address given at the start message and connect with the default VNC password cms.cern. It shows an empty screen to start with and all graphics will pop up there.

To test, start ROOT by typing root in the container terminal prompt. In the ROOT prompt, type TBrowser t to open the ROOT graphical window. If the graphical window opens you are all set and you can exit from ROOT either by choosing the “Quit Root” option from Browser menu of the TBrowser window or by typing .q in the ROOT prompt.

Importantly, stop the VNC server before exiting the container. If you don’t do it, you will need to do some cleaning before being able to open the graphics window next time you use the same container. Do the following:

 stop_vnc
 exit

We will use the docker run command to create the container (downloading the appropriate image if it is the first time) and start it right away.

docker run -it --name my_od -P -p 5901:5901 -p 6080:6080 -v ${HOME}/cms_open_data_work:/code cmsopendata/cmssw_7_6_7-slc6_amd64_gcc493 /bin/bash
Setting up CMSSW_7_6_7
CMSSW should now be available.
This is a standalone image for CMSSW_7_6_7 slc6_amd64_gcc493.
(/code/CMSSW_7_6_7/src)

This is now a bash shell in the CMS open data environment in which you have access to a complete CMS software release that is appropriate for interfacing with the 2015 13 TeV datasets.

As there are rate limits for pulls from Docker Hub, you may get the following error message: docker: Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading.. In that case, try later (the limit is per 6 hours) or use the mirror gitlab-registry.cern.ch/cms-cloud/cmssw-docker-opendata/cmssw_7_6_7-slc6_amd64_gcc493 instead of cmsopendata/cmssw_7_6_7_vnc.

Windows WSL2 exiting with code 139?

If the docker command exits without giving you the output above, typically showing exit code 139 in the docker console, see this post in the CERN Open Data forum (note in particular that the .wslconfig file that you need to add must not have a file extension, if Windows adds it automatically, rename the file).

Now let’s understand the options that were used for the docker run command.

  • First, the -it (or -i) option means to start the container in interactive mode. Essentially, it means that you will end up inside the running container.
  • We assign a name to the container using the --name switch, so that we can refer back to this environment and still access any files we created in there. You can, of course, choose a different name than my_od.
  • The options -P -p 5901:5901 -p 6080:6080 open/publish ports from the container to the local host, needed for the graphical windows
  • With -v ${HOME}/cms_open_data_work:/code, the working directory cms_open_data_work that you created in your home directory is mounted with the -v option into the container's /code directory. This makes it possible to edit files in the CMSSW area of your container with your normal editor on your local computer.
  • cmsopendata/cmssw_7_6_7-slc6_amd64_gcc493 is the name of the image we will use. If no label is prepended, Docker assumes that it resides in Docker Hub, the official image repository of Docker.
  • Finally, the /bin/bash option will throw the container into a bash shell when running interactively.

For a more complete listing of options, see the official Docker documentation on the docker run command.

Now, first make sure that you can copy instructions from a browser page to the container terminal. It works in the same manner as the local WSL linux terminal, i.e. you can usually copy from other sources with Ctrl+C and then paste into your container terminal with mouse right click. Copy from the terminal itself by selecting the text to be copied. If this does not work, you will see later in these instructions how to pass files from your local computer to the container.

This container has a VNC application installed to allow opening graphical windows on a remote machine (seen from the container, your own computer is a remote machine). Start the application with start_vnc from your container prompt. You will need to start it every time you use the container (if you want to open graphics windows).

 start_vnc
xauth:  file /home/cmsusr/.Xauthority does not exist

New 'myvnc:1' desktop is 1df549a6f098:1

Starting applications specified in /home/cmsusr/.vnc/xstartup
Log file is /home/cmsusr/.vnc/1df549a6f098:1.log

[1] 144
VNC connection points:
        VNC viewer address: 127.0.0.1:5901
        HTTP access: http://127.0.0.1:6080/vnc.html
To kill the vncserver enter 'vncserver -kill :1'

Open the browser window in the http address given at the start message and connect with the default VNC password cms.cern. It shows an empty screen to start with and all graphics will pop up there. If it does not open, it may be that the Windows firewall is blocking it. In that case, check these instructions.

To test, start ROOT by typing root in the container terminal prompt. In the ROOT prompt, type TBrowser t to open the ROOT graphical window. If the graphical window opens you are all set and you can exit from ROOT either by choosing the “Quit Root” option from Browser menu of the TBrowser window or by typing .q in the ROOT prompt.

Importantly, stop the VNC server before exiting the container. If you don’t do it, you will need to do some cleaning before being able to open the graphics window next time you use the same container. Do the following:

 stop_vnc
 exit

We will use the docker run command to create the container (downloading the appropriate image if it is the first time) and start it right away.

docker run -it --name my_od -P -p 5901:5901 -p 6080:6080 -v ${HOME}/cms_open_data_work:/code cmsopendata/cmssw_7_6_7-slc6_amd64_gcc493 /bin/bash
Setting up CMSSW_7_6_7
CMSSW should now be available.
This is a standalone image for CMSSW_7_6_7 slc6_amd64_gcc493.
(/code/CMSSW_7_6_7/src)

This is now a bash shell in the CMS open data environment in which you have access to a complete CMS software release that is appropriate for interfacing with the 2015 13 TeV datasets.

Problems have been reported running amd-based containers such as this on MacOS with M1 chip. Increasing the memory available to Docker may help. Please check the possible solutions in this post in the CERN Open Data forum. Note, however, that this may help you to open the container, but it is very likely that problems remain when you try to compile code and run jobs in it.

As there are rate limits for pulls from Docker Hub, you may get the following error message: docker: Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading.. In that case, try later (the limit is per 6 hours) or use the mirror gitlab-registry.cern.ch/cms-cloud/cmssw-docker-opendata/cmssw_7_6_7-slc6_amd64_gcc493 instead of cmsopendata/cmssw_7_6_7-slc6_amd64_gcc493.

Now let’s understand the options that were used for the docker run command.

  • First, the -it (or -i) option means to start the container in interactive mode. Essentially, it means that you will end up inside the running container.
  • We assign a name to the container using the --name switch, so that we can refer back to this environment and still access any files we created in there. You can, of course, choose a different name than my_od.
  • The options -P -p 5901:5901 -p 6080:6080 open/publish ports from the container to the local host, needed for the graphical windows
  • With -v ${HOME}/cms_open_data_work:/code, the working directory cms_open_data_work that you created in your home directory is mounted with the -v option into the container's /code directory. This makes it possible to edit files in the CMSSW area of your container with your normal editor on your local computer.
  • cmsopendata/cmssw_7_6_7-slc6_amd64_gcc493 is the name of the image we will use. If no label is prepended, Doc
  • With -v ${HOME}/cms_open_data_work:/code, the working directory cms_open_data_work that you created in your home directory is mounted with the -v option into the container's /code directory. This makes it possible to edit files in the CMSSW area of your container with your normal editor on your local computer.
  • ker assumes that it resides in Docker Hub, the official image repository of Docker.
  • Finally, the /bin/bash option will throw the container into a bash shell when running interactively.

For a more complete listing of options, see the official Docker documentation on the docker run command.

This container has a VNC application installed to allow opening graphical windows on a remote machine (seen from the container, your own computer is a remote machine). Start the application with start_vnc from your container prompt. You will need to start it every time you use the container (if you want to open graphics windows).

start_vnc
xauth:  file /home/cmsusr/.Xauthority does not exist

New 'myvnc:1' desktop is 1df549a6f098:1

Starting applications specified in /home/cmsusr/.vnc/xstartup
Log file is /home/cmsusr/.vnc/1df549a6f098:1.log

[1] 144
VNC connection points:
        VNC viewer address: 127.0.0.1:5901
        HTTP access: http://127.0.0.1:6080/vnc.html
To kill the vncserver enter 'vncserver -kill :1'

Open the browser window in the http address given at the start message and connect with the default VNC password cms.cern. It shows an empty screen to start with and all graphics will pop up there.

To test, start ROOT by typing root in the container terminal prompt. In the ROOT prompt, type TBrowser t to open the ROOT graphical window. If the graphical window opens you are all set and you can exit from ROOT either by choosing the “Quit Root” option from Browser menu of the TBrowser window or by typing .q in the ROOT prompt.

Importantly, stop the VNC server before exiting the container. If you don’t do it, you will need to do some cleaning before being able to open the graphics window next time you use the same container. Do the following:

 stop_vnc
 exit

Coming back to the same container

You can come back to the same container you’ve used earlier with the docker start ... command.

docker start -i my_od

Note that running the docker run ... command as before would create a new container from the image you’ve downloaded. This would be a new environment. However, as we are mounting the directory from the local computer into the container, you will see the files from your earlier container even in your new container. Most often, you do not want to create a new container but you would use the existing container to go to the same working area with all our files and code saved.

CHALLENGE! Test persistence

Go into the container and in the /code/CMSSW_7_6_7/src directory, create a test file using some simple shell commands. Type the following exactly as you see it. It will dump some text into a file and then print the contents of the file to the screen

echo "I am still here" > test.tmp
cat test.tmp

After you’ve done this, check if you see the file test.tmp in your local computer in the cms_open_data_work/CMSSW_7_6_7/src directory. If you did it correctly, you should be able to list the contents of the directory with ls -l and see your file from before! If not, check that you followed all the instructions above correctly or contact the facilitators.

Now, exit from the container and remove it with

docker rm my_od

Create a new container with the docker run command that you used in the first place. Check if you see the file that you created before.

Note that with the volume mount, your files will not disappear when you remove the container because they are stored in a directory on the local computer. If you really want to get rid of them, you will have to delete them either on the container or on your local computer.

You can make use of this, for example, when you have forgotten to stop VNC with stop_vnc when you exit the container. Probably the quickest way to clean is to remove and recreate the container. When you do it, the files that block the VNC from starting will be removed and as they are not located in the mounted directory, they will not be present when you create a new container. But the files in your working area (as test.tmp above) will be there again.

Stopping and removing containers

As you are learning how to use Docker, you may find yourself with multiple containers. Or maybe you started a container with your favourite name with some set of flags and now you want use that same name but with new flags. In that case, you will want to stop the container and remove it.

A container stops when you type the exit command in the container prompt. It may happen that you accidentally close the terminal where the container is running. In that case, the container will not stop and it will remain running. You can list the running containers with docker ps. You can either return to the container using its name (here “my_od”) with the start command on your local machine and then exit normally from the container prompt:

docker start -i my_od
exit

or stop the container with

docker stop my_od

To stop all running containers:

docker stop $(docker ps -q)

To remove the container “my_od”, you would type the following. Note that this will delete the container and all files, but the files in the CMSSW_7_6_7/src directory which is shared with your local computer will remain in your local computer’s directory.

docker rm my_od

To remove all containers:

docker rm $(docker ps -aq)

Don’t worry!

Note that these commands will not remove the actual Docker image that you downloaded and may have taken quite some time to download! Whew!

Key Points

  • You have now set up a docker container as a working enviroment for CMS open data. You know how to open a graphical window in it and how to pass files between your own computer and the container.