This lesson is being superseded (link to newer version)

Docker pre-exercises

Introduction

Overview

Teaching: 5 min
Exercises: 0 min
Questions
  • What is Docker?

  • What is the point of these exercises?

Objectives
  • Learn about Docker and why we’re using it

Let’s learn about Docker and why we’re using it!

Regardless of what you encounter in this lesson, the definitive guide is any official documentation provided by Docker.

What is Docker?

From the Docker website

A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.

In short, Docker allows a user to work in a computing environment that has been frozen with respect to interdependent libraries and code and related tools. This means that you can use the same software that analysts were using 10 years ago (for example) without downloading all the relevant 10-year-old libraries. :)

What can I learn here?

As much as we’d like, we can’t give you a complete overview of Docker. However, we do hope to explain why we run Docker in the way we do so that you gain some understanding. More specifically, we’ll be showing you how to set up Docker for not just this workshop, but for interfacing with the CMS open data in general

Key Points

  • Docker is an implementation of a tool called a container that gives us a self-consistent computing environment

  • Docker is widely used these days in both industry and academic research

  • Docker is one way that you can interface with CMS data using the same computing tools as CMS collaborators


Installing Docker

Overview

Teaching: 5 min
Exercises: 15 min
Questions
  • What equipment do I need?

  • How do I install Docker?

Objectives
  • Install Docker on your machine

Installing Docker is relatively straightforward, particularly because of the excellent documentation they provide. Still you want to set aside some time to do it properly and test it out.

Installing

Go to the offical Docker site and their installation instructions to install Docker for your operating system.

We see no need to go beyond the documentation they provide so we leave it up to you to follow their installation procedure.

Testing

As you walk through their documentation, you will eventually come to a point where you will run a very simple test, usually involving their hello-world container.

You can find their documentation for this step here.

Testing their code can be summed up by the ability to run (without generating any errors) the following commands.

docker --version
docker run hello-world

Key Points

  • For up-to-date details for installing Docker, the official documentation is the best bet.

  • Make sure you were able to download and run Docker’s hello-world example.


Using Docker with the CMS open data

Overview

Teaching: Self-guided min
Exercises: 40 min min
Questions
  • How do I use docker to effectively interface with the CMS open data?

Objectives
  • Download (fetch) the correct docker image

  • Fire up docker in the most useful way for CMS Open Data analysis

  • Use docker in a persistent way

  • Copy data out of the docker environment

  • Access Github repositories from within a docker environment

Overview

This exercise will walk you through setting up and familiarizing yourself with Docker, so that you can effectively use it to interface with the CMS open data. It is not meant to completely cover containers and everything you can do with Docker, but reach out to the organizers using the dedicated Mattermost channel if we are missing something.

Using the proper image for CMS software

The first time you go to run Docker, the following command will fetch the docker image and put you into a bash shell in which you have access to a complete CMS software release that is appropriate for interfacing with the 2011 and 2012 7 and 8 TeV datasets. It may take some time to download the full image, even as long as 20-30 minutes, depending on the speed of your internet connection.

This command and some extra guidance can also be found on the Open Data Portal introduction to Docker, however the following command differs in that it allows for X11 forwarding That means that if you run a program from within Docker that pops up any windows or graphics, like ROOT, they will show up.

Keep in mind, on some systems, the file/directory paths might be different, so reach out to the organizers through the dedicated Mattermost channel if you have issues.

docker run -it --name myopendataproject --net=host --env="DISPLAY" --volume="$HOME/.Xauthority:/home/cmsusr/.Xauthority:rw"  cmsopendata/cmssw_5_3_32 /bin/bash
Setting up CMSSW_5_3_32
CMSSW should now be available.
[21:53:43] cmsusr@docker-desktop ~/CMSSW_5_3_32/src $

Possible issues on Windows

If the docker command exits without giving you this output on WSL2 (Windows), see this post in the CERN Open Data forum

It might be worth breaking down this command for the interested user. For a more complete listing of options, see the official Docker documentation on the run command.

To start a CMSSW container instance and open it in a bash shell, one would need only type

docker run -it cmsopendata/cmssw_5_3_32 /bin/bash

The -it option means to start the instance in interactive mode

Adding the following assigns a name to the instance so that we can refer back to this environment and still access any files we created in there. You can, of course, choose a different name than myopendataproject! :)

... --name myopendataproject ...

Adding the following gives us X11-forwarding, though this will not work with Windows10 WSL2 Linux.

... --net=host --env="DISPLAY" --volume="$HOME/.Xauthority:/home/cmsusr/.Xauthority:rw"  ...

When you’re done, you can just type exit to leave the Docker environment.

Additional flags

Later on in this lesson we will show you two additional arguments to this command, both related to mounting local directories on your laptop/desktop such that it will be visible in the Docker container.

One example we will show you will walk you through creating a local working directory for your analysis code. This means that you can edit your scripts or files locally and exectute them in Docker. It will give you much greater flexibility in using whatever backup or version control you are comfortable with.

In a separate module we will show you how to mount the CERN-VM file system (CVMFS), giving you more access to CMS software and calibration information. CVMFS will be discussed in greater detail in that module

As these flags are discussed, we will modify this primary docker command in those sections.

Stopping docker instances

As you are learning how to use Docker, you may find yourself with multiple instances running. Or maybe you started an instance with your favourite name with some set of flags and now you want to re-start that same instance but with new flags. In that case, you will want to stop and remove the running containers.

To stop all containers you would type the following on your local machine.

docker stop $(docker ps -aq)

To remove all containers, you would type the following on your local machine.

docker rm $(docker ps -aq)

Don’t worry!

Note that these commands will not remove the actual Docker files that you downloaded and may have taken quite some time to download! Whew!

Using Docker repeatedly

The next time you want to run Docker, it will not need to download any significant data so it should open in seconds. You could choose to run the same command as before and while that would work and quickly put you into a Docker environment, there are some issues with this. Most significantly, any files that you make or any code that you write in that environment will not be there! Instead of the above command, we want to run Docker in a persistent way so that we keep going into the same working area with all our files and code saved each time.

There are two ways to do this: by giving your container instance a name or by making sure you reference the container id. The former approach is probably easier and preferred, but we discuss both below.

Start docker by name

The easiest way to start a docker instance that you want to return to is using the --name option, as shown in the first example. If you’ve named your instance similarly, you can start the instance, just by providing the name. You will also use -i for interactive rather than -it. It will still come up as normal.

Note also that you do not need the full cmsopendata/cmssw_5_3_32 argument anymore.

So to re-start your container, just do the following and you will still have X11-forwarding (on Linux and Mac) and the mounted disk volumes, assuming you ran the full command earlier.

docker start -i myopendataproject

Start/Attach to a particular process

If you did not name your container instance but still want to return to a very specifc environment, you will need to start and then attach to the exact same Docker instance as before. First of all, you want to see what other Docker processes we have running. To do this, run the following command

docker ps -a

You’ll see a list of docker processes that may look something like the following (the exact output will vary from user to user).

CONTAINER ID        IMAGE                      COMMAND                  CREATED             STATUS                      PORTS               NAMES
4f323c317b90        hello-world                "/hello"                 3 minutes ago       Exited (0) 3 minutes ago                        modest_jang
7719a7d74190        cmsopendata/cmssw_5_3_32   "/opt/cms/entrypoint…"   9 minutes ago       Exited (0) 2 minutes ago                        happy_greider
8939ade0bfac        cmsopendata/cmssw_5_3_32   "/opt/cms/entrypoint…"   16 hours ago        Exited (128) 16 hours ago                       hungry_bhaskara
e914cef3c45a        cmsopendata/cmssw_5_3_32   "/opt/cms/entrypoint…"   6 days ago          Exited (1) 9 minutes ago                        beautiful_tereshkova
b3a888c059f7        cmsopendata/cmssw_5_3_32   "/opt/cms/entrypoint…"   13 days ago         Exited (0) 13 days ago                          affectionate_ardinghelli

You’ll want to attach using the CONTAINER ID. In the above example, I know that I’ve been using the most recent container instance for cmsopendata, 7719a7d74190. So to reattach, I run the following line which will start and attach all in one line. Note that you would want to change the CONTAINER ID for your particular case.

docker start -a 7719a7d74190

Voila! You should be back in the same container.

CHALLENGE! Test X11 forwarding

For Windows users, open a specific CMS open data container docker run -it -P -p 5901:5901 -p 6080:6080 cmsopendata/cmssw_5_3_32_vnc:latest /bin/bash. In the container, type start_vnc and choose a password. Open a browser window with the given URL (enter the password), and start ROOT. If the web browser doesn’t work for you, alternative:

  • Go to https://bintray.com/tigervnc/stable/tigervnc/1.10.0 and download vncviewer64-1.10.0.exe
  • Run vncviewer64-1.10.0.exe, enter vnc server name: 127.0.0.1:5901, click connect, enter password

For Mac and Linux users, open the CMS open data container with docker start... or docker run... as instructed above and open ROOT, simply by typing root on the command line.

Do you see the ROOT splash screen pop up? If not, check that you followed all the instructions above correctly or contact the facilitators.

To exit the ROOT interpreter type .q.

If you find that X11 forwarding is not working, try typing the following before going starting your Docker container.

xhost local:root

CHALLENGE! Test persistence

Go into the Docker environment and create a test file using some simple shell commands. Type the following exactly as you see it. It will dump some text into a file and then print the contents of the file to the screen

echo "I am still here" > test.tmp
cat test.tmp

After you’ve done this, exit out of the container and try to attach to the same instance. If you did it correctly, you should be able to list the contents of the directory with ls -l and see your file from before! If not, check that you followed all the instructions above correctly or contact the facilitators.

Copy file(s) into or out of a container

Sometimes you will want to copy a file directly into or out of a container. Let’s start with copying a file out.

Suppose you have created your myopendataproject container and you did the challenge question above to Test persistence. In your docker image, there should be a file now called test.tmp Run the following on your local machine and not in a docker environment. It should copy the file out and onto your local machine where you can inspect it.

docker cp myopendataproject:/home/cmsusr/CMSSW_5_3_32/src/test.tmp .

If you want to copy a file into a container instance, it works the way you might expect. Suppose you have a local file called localfile.tmp. You can copy it into the same instance as follows.

docker cp localfile.tmp myopendataproject:/home/cmsusr/CMSSW_5_3_32/src/

Mounting a local volume

Sometimes you may want to mount a filesystem from your local machine or some other remote system so that your docker image can see it. Let’s first see how this is done in a general way.

The basic usage is

docker run -v <path on host>:<path in container> <image>

Where the path on host is the full path to the local file system/directory you want to make visible to docker. The path in container is where it will be mounted in your Docker container.

There are more options and if you want to read more, please visit the official Docker documentation.

When working with the CMS open data, you will find yourself using this approach in at least two ways:

Note that all your compiling and executing still has to be done in the Docker container! But having your source code also visible on your local laptop/desktop will make things easier for you.

Let’s try this. First, before you start up your Docker image, create a local directory where you will be doing your code development. In the example below, I’m calling it cms_open_data_work and it will live in my $HOME directory. You may choose a shorter directory name if you like. :)

Local machine

cd # This is to make sure I'm in my home directory
mkdir cms_open_data_work

Then fire up your Docker container, adding the following

-v ${HOME}/cms_open_data_work:/home/cmsusr:shared

Your full docker command would then look like this

Local machine

docker run -it --name myopendataproject --net=host --env="DISPLAY" --volume="$HOME/.Xauthority:/home/cmsusr/.Xauthority:rw" -v ${HOME}/cms_open_data_work:/home/cmsusr/cms_open_data_work:shared cmsopendata/cmssw_5_3_32 /bin/bash
Setting up CMSSW_5_3_32
CMSSW should now be available.
[21:53:43] cmsusr@docker-desktop ~/CMSSW_5_3_32/src $

When your Docker container starts up, it puts you in /home/cmsusr/CMSSW_5_3_32/src, but your new mounted directory is /home/cmsusr/cms_open_data_work. The easiest thing to do is to create a soft link to that directory from inside /home/cmsusr/CMSSW_5_3_32/src using ln -s ... as shown below, and then do your work in that directory.

Warning!

Sometimes the local volume is mounted in the Docker container as the wrong user/group. It should be cmsusr but sometimes is mounted as cmsinst. Note that in the following set of commands, we add a line to change the user/group with the chown command.

If this is an issue, you’ll also need to do this in Docker for any new directories you check out on your local machine.

Docker container

cd /home/cmsusr/CMSSW_5_3_32/src
sudo chown -R cmsusr.cmsusr ~/cms_open_data_work/
ln -s ~/cms_open_data_work/
cd /home/cmsusr/CMSSW_5_3_32/src/cms_open_data_work/

Now, open a new terminal on your local machine (or simply exit out of your container) and check out one of the repositories you’ll be working with. If you are not familiar with git/Github, check out the Git pre-exercises.

Local machine

cd ~/cms_open_data_work
git clone https://github.com/katilp/AOD2NanoAODOutreachTool.git AOD2NanoAOD
Cloning into 'AOD2NanoAOD'...
remote: Enumerating objects: 60, done.
remote: Counting objects: 100% (60/60), done.
remote: Compressing objects: 100% (54/54), done.
remote: Total 343 (delta 29), reused 13 (delta 5), pack-reused 283
Receiving objects: 100% (343/343), 743.11 KiB | 461.00 KiB/s, done.
Resolving deltas: 100% (162/162), done.

Next, go back into your Docker container (either in your other window or by restarting that same container, and see if you can see this new directory that you checked out on your local machine.

Docker container

cd /home/cmsusr/CMSSW_5_3_32/src/cms_open_data_work
ls -l
total 4
drwxr-xr-x 8 cmsinst cmsinst 4096 Sep 26 20:48 AOD2NanoAOD

Voila! You now have a workflow where you can edit files locally, using whatever tools are on your local machine, and then exectute them in the Docker environment.

Let’s try compiling and running this new code! Note that to actually compile the code, we want to be in the /home/cmsusr/CMSSW_5_3_32/src directory.

Docker container

cd /home/cmsusr/CMSSW_5_3_32/src
sudo chown -R cmsusr.cmsusr cms_open_data_work/AOD2NanoAOD/
scram b
Reading cached build data
>> Local Products Rules ..... started
>> Local Products Rules ..... done
>> Building CMSSW version CMSSW_5_3_32 ----
>> Entering Package cms_open_data_work/AOD2NanoAOD
>> Creating project symlinks
Entering library rule at cms_open_data_work/AOD2NanoAOD
>> Compiling edm plugin /home/cmsusr/CMSSW_5_3_32/src/cms_open_data_work/AOD2NanoAOD/src/AOD2NanoAOD.cc 
>> Building edm plugin tmp/slc6_amd64_gcc472/src/cms_open_data_work/AOD2NanoAOD/src/cms_open_data_workAOD2NanoAOD/libcms_open_data_workAOD2NanoAOD.so
Leaving library rule at cms_open_data_work/AOD2NanoAOD
@@@@ Running edmWriteConfigs for cms_open_data_workAOD2NanoAOD
--- Registered EDM Plugin: cms_open_data_workAOD2NanoAOD
>> Leaving Package cms_open_data_work/AOD2NanoAOD
>> Package cms_open_data_work/AOD2NanoAOD built
>> Subsystem cms_open_data_work built
>> Local Products Rules ..... started
>> Local Products Rules ..... done
gmake[1]: Entering directory `/home/cmsusr/CMSSW_5_3_32'
>> Creating project symlinks
>> Done python_symlink
>> Compiling python modules cfipython/slc6_amd64_gcc472
>> Compiling python modules python
>> Compiling python modules src/cms_open_data_work/AOD2NanoAOD/python
>> All python modules compiled
@@@@ Refreshing Plugins:edmPluginRefresh
>> Pluging of all type refreshed.
>> Done generating edm plugin poisoned information
gmake[1]: Leaving directory `/home/cmsusr/CMSSW_5_3_32'

And now we can run it! The following command may take anywhere from 10-20 minutes to run.

Docker container

cd /home/cmsusr/CMSSW_5_3_32/src/cms_open_data_work/AOD2NanoAOD/
cmsRun configs/data_cfg.py
200926 22:12:20 802 secgsi_InitProxy: cannot access private key file: /home/cmsusr/.globus/userkey.pem
26-Sep-2020 22:46:14 CEST  Initiating request to open file root://eospublic.cern.ch//eos/opendata/cms/Run2012B/TauPlusX/AOD/22Jan2013-v1/20000/0040CF04-8E74-E211-AD0C-00266CFFA344.root
26-Sep-2020 22:46:17 CEST  Successfully opened file root://eospublic.cern.ch//eos/opendata/cms/Run2012B/TauPlusX/AOD/22Jan2013-v1/20000/0040CF04-8E74-E211-AD0C-00266CFFA344.root
26-Sep-2020 22:51:14 CEST  Closed file root://eospublic.cern.ch//eos/opendata/cms/Run2012B/TauPlusX/AOD/22Jan2013-v1/20000/0040CF04-8E74-E211-AD0C-00266CFFA344.root

=============================================

MessageLogger Summary

 type     category        sev    module        subroutine        count    total
 ---- -------------------- -- ---------------- ----------------  -----    -----
    1 fileAction           -s file_close                             1        1
    2 fileAction           -s file_open                              2        2

 type    category    Examples: run/evt        run/evt          run/evt
 ---- -------------------- ---------------- ---------------- ----------------
    1 fileAction           PostEndRun
    2 fileAction           pre-events       pre-events

Severity    # Occurrences   Total Occurrences
--------    -------------   -----------------
System                  3                   3

Key Points

  • Docker is easy to use but there are number of options you have to be careful with in order to use it effectively with the CMS open data


Setting up CVMFS

Overview

Teaching: 15 min
Exercises: 30 min
Questions
  • How do I access some CMS-specific software

Objectives
  • Install the CVMFS on either your local machine or inside the Docker container

  • Use CVMFS to access and run tools used to calculate the luminosity for specific run periods

At some point you may want to calculate the luminosity or study trigger effects for the data you are analyzing. CMS uses a tool called brilcalc but it is not included in the Docker image.

To get around this, we mount drives at CERN in the Docker container when you fire it up so that you can call this and perhaps other tools.

To learn more about brilcalc you can read the CERN Open Data Portal documentation. This lesson however is just to help you test that you can access this tool.

Installing CVMSFS

There are two ways to install CVMFS:

Installing CVMFS locally

From the CVMFS documentation

The CernVM-File System (CernVM-FS) provides a scalable, reliable and low- maintenance software distribution service. It was developed to assist High Energy Physics (HEP) collaborations to deploy software on the worldwide- distributed computing infrastructure used to run data processing applications. CernVM-FS is implemented as a POSIX read-only file system in user space (a FUSE module). Files and directories are hosted on standard web servers and mounted in the universal namespace /cvmfs.

In the following sections, we’ll direct you to the appropriate pages to download and install CVMFS locally. This should be done on your local machine and not in the container.

Let’s walk through these steps.

Get CVMFS

Follow the installation instructions to download the software for your particular OS.

Setup CVMFS

Next, you’ll want to set things up on your local machine. Follow these instructions carefully for your system.

One part of the setup which can be confusing is the content of the file /etc/cvmfs/default.local. The following lines should work, if they are the sole content of the file.

CVMFS_REPOSITORIES=cms.cern.ch,cms-opendata-conddb.cern.ch,cms-bril.cern.ch
CVMFS_HTTP_PROXY=DIRECT
CVMFS_CLIENT_PROFILE=single

Check the installation

Make sure you verify the installation. Check that link for the latest commands, but usually this involves running

sudo cvmfs_config setup

and then

cvmfs_config probe

or

sudo systemctl restart autofs

Possible issues

For some systems, you may run into some issues.

On WSL2 Ubuntu, after the installation, on each session, one has to run the following

sudo /usr/sbin/automount --pid-file /var/run/autofs.pid
cvmfs_config probe

You may even find that even during a session, you need to re-run

cvmfs_config probe

on your host machine, even on Linux or Mac.

Set up your Docker container

In the previous instructions on starting up your Docker container, we included the commands to start your container with CVMFS mounted.

... --volume "/cvmfs:/cvmfs:shared" ...

This makes sure the container can see this CVMFS file system.

If you ran all the correct commands in the previous exercise, you can now start the container by name

docker start -i myopendataproject

Or, if you wanted to start a brand new, fresh container with the CVMFS file system mounted, you can run

docker run -it --name mycvmfs --volume "/cvmfs:/cvmfs:shared" cmsopendata/cmssw_5_3_32 /bin/bash

If you want to simply build upon everything you have done already, your full Docker command would now look like

docker run -it --name myopendataproject --net=host --env="DISPLAY" --volume="$HOME/.Xauthority:/home/cmsusr/.Xauthority:rw" -v ${HOME}/cms_open_data_work:/home/cmsusr/cms_open_data_work:shared --volume "/cvmfs:/cvmfs:shared" cmsopendata/cmssw_5_3_32 /bin/bash

Install CVMFS directly in the Docker container

We’ll be following the offical CVMFS documentation here and here but with specific instructions for the CMSSW Docker image.

You’ll want to launch Docker with a new --privileged flag that will make it easier to install new packages. If we are using the full command from the previous module, it would now look like this

docker run --privileged -it --name myopendataproject --net=host --env="DISPLAY" --volume="$HOME/.Xauthority:/home/cmsusr/.Xauthority:rw" -v ${HOME}/cms_open_data_work:/home/cmsusr/cms_open_data_work:shared cmsopendata/cmssw_5_3_32 /bin/bash

The following command are all done in the Docker container.

Install the necessary packages using yum. This could take up to 30 minutes to install these packages.

At some point, the installation process will prompt you for your approval, Is this ok [y/N]:. You can enter y.

sudo yum install https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest.noarch.rpm
sudo yum install -y cvmfs
Loaded plugins: changelog, kernel-module, ovl, protectbase, tsflags, versionlock
Setting up Install Process
cvmfs-release-latest.noarch.rpm                                                                                                          | 5.5 kB     00:00
Examining /var/tmp/yum-root-NokpI5/cvmfs-release-latest.noarch.rpm: cvmfs-release-2-6.noarch
Marking /var/tmp/yum-root-NokpI5/cvmfs-release-latest.noarch.rpm to be installed
EGI-trustanchors                                                                                                                         | 2.5 kB     00:00
EGI-trustanchors/primary_db                                                                                                              |  56 kB     00:00
.
<more output follows>
.

Next you’ll need to configure autofs, which handles mounting of filesystems.

sudo cvmfs_config setup

Edit the /etc/cvmfs/default.local file

sudo vi /etc/cvmfs/default.local

and add these lines:

CVMFS_REPOSITORIES=cms.cern.ch,cms-opendata-conddb.cern.ch,cms-bril.cern.ch
CVMFS_HTTP_PROXY=DIRECT
CVMFS_CLIENT_PROFILE=single

Restart autofs

sudo service autofs restart

Verify the file system

cvmfs_config probe

Heads-up!

You will need to repeat the last two commands every time you restart the container.

Test it out and run brilcalc

Once you are in the container and are in the CMSSW 5.3.32 environment, and have CVMFS working through either of the above methods, you can then run the following commands to set some local environment variables and then install brilcalc using the python pip command.

export PATH=$HOME/.local/bin:/cvmfs/cms-bril.cern.ch/brilconda/bin:$PATH

pip install --user brilws

Each time you login, you will have to re-run that export command, even if you have already installed brilws in the container.

If everything worked, you should be able to run brilcalc to check its version and to get the luminosity for a sample run.

Note that the first time you run the brilcalc commands, it can take up to 7 minutes to run!

brilcalc --version

brilcalc lumi -c web -r 160431

It should be noted that during the workshop, we will have an entire exercise dedicated to using this tool to calculate the luminosity for your datasets.

Key Points

  • Installing CVMFS can make some parts of the analysis much easier

  • Care must be given though to setting up your environment properly.


Test and validate

Overview

Teaching: 10 min
Exercises: 30 min
Questions
  • What is in the CMS Docker image?

  • How do I test and validate my Docker container?

Objectives
  • Learn about the details of the CMS Docker container

  • Test and validate the CMS Docker image by running a CMSSW job.

Helpline

Remember that we are always available to help. Our Mattermost channel is open.

Know your Docker image

The Docker container we just installed provides CMS computing environment to be used with the 2011 and 2012 CMS open data. The Docker container uses Scientific Linux CERN. As it was mentioned before, it comes equipped with the ROOT framework and CMSSW.

An important feature of the image is the availability of the CernVM File System.
Thanks to the cvmfs client installed, the Docker instance gets the CMS software (CMSSW) from the shared /cvmfs/cms.cern.ch area (physically at CERN but mounted locally) and the jobs, running on the CMS open data Docker image, read the conditions data from /cvmfs/cms-opendata-conddb.cern.ch. Access to the data is through XRootD.

Run a simple demo for testing and validating

The validation procedure tests that the CMS environment is installed and operational on your Docker container, and that you have access to the CMS Open Data files. It also access the conditions data from the shared cvmfs area and caches them. This last action will save us time during the workshop. These steps also give you a quick introduction to the CMS environment.

Run the following command to create the CMS runtime variables:

cmsenv

Create a working directory for the demo analyzer, change to that directory and create a skeleton for the analyzer:

mkdir Demo
cd Demo
mkedanlzr DemoAnalyzer

Come back to the main src area:

cd ../

Compile the code:

scram b

You can safely ignore the warning.

IMPORTANT NOTE: Depending on your system, there could be some issues with the shared clipboard between the host machine and the Docker container.
This means that it is possible that you cannot copy the instrucitons in this episode directly into your Docker session.

One thing you can try is Shift+Ctrl+V when pasting into your Docker terminal, rather than Ctrl-V. That sometimes will work.

The quickest workaround might be using ssh and/or scp commands to copy the required files to some other machine that you have access to, from the Docker container as well as from the host machine. For instance, if you had access to an lxplus computer at cern, you could copy a certain file from the Docker container to the lxplus computer. On the Docker container you could do:

scp myfile.txt myusername@lxplus.cern.ch:.

to copy a hypothetical file myfile.txt to lxplus, and then on the host

scp myusername@lxplus.cern.ch:myfile.txt .

to copy the same file back to your host machine. Then you can edit the file locally and reverse the process to get it back to your Docker container.

It could also be possible to have direct access from the host to the Docker container. This youtube tutorial might be of help for that option.

Before launching the job, let’s modify the configuration file (do not worry, you will learn about all this stuff in a different lesson) so it is able access a CMS open data file and cache the conditions data. As it was mentioned, this will save us time later.

Open the demoanalyzer_cfg.py file using the vi editor (here you can find a good cheatsheet for that editor).

vi Demo/DemoAnalyzer/demoanalyzer_cfg.py

Replace file:myfile.root with root://eospublic.cern.ch//eos/opendata/cms/Run2011A/ElectronHad/AOD/12Oct2013-v1/20001/001F9231-F141-E311-8F76-003048F00942.root to point to an example file.

Chage also the maximum number of events to 10. I.e., change -1to 10 in process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(-1)).

In addition, insert, below the PoolSource module, the following lines:

#needed to cache the conditions data
process.load('Configuration.StandardSequences.FrontierConditions_GlobalTag_cff')
process.GlobalTag.connect = cms.string('sqlite_file:/cvmfs/cms-opendata-conddb.cern.ch/FT_53_LV5_AN1_RUNA.db')
process.GlobalTag.globaltag = 'FT_53_LV5_AN1::All'

Take a look at the final validation config file

At the end, the config file should look like

import FWCore.ParameterSet.Config as cms
process = cms.Process("Demo")
process.load("FWCore.MessageService.MessageLogger_cfi")
process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(10) )
process.source = cms.Source("PoolSource",
# replace 'myfile.root' with the source file you want to use
   fileNames = cms.untracked.vstring(
       'root://eospublic.cern.ch//eos/opendata/cms/Run2011A/ElectronHad/AOD/12Oct2013-v1/20001/001F9231-F141-E311-8F76-003048F00942.root'
   )
)
#needed to cache the conditions data
process.load('Configuration.StandardSequences.FrontierConditions_GlobalTag_cff')
process.GlobalTag.connect = cms.string('sqlite_file:/cvmfs/cms-opendata-conddb.cern.ch/FT_53_LV5_AN1_RUNA.db')
process.GlobalTag.globaltag = 'FT_53_LV5_AN1::All'

process.demo = cms.EDAnalyzer('DemoAnalyzer'
)

process.p = cms.Path(process.demo)

Make symbolic links to the conditions database files from cvmfs:

ln -sf /cvmfs/cms-opendata-conddb.cern.ch/FT_53_LV5_AN1_RUNA FT_53_LV5_AN1
ln -sf /cvmfs/cms-opendata-conddb.cern.ch/FT_53_LV5_AN1_RUNA.db FT_53_LV5_AN1_RUNA.db

and make sure the cms-opendata-conddb.cern.ch directory has actually expanded in your Docker instance. One way of doing this is executing:

ls -l /cvmfs/
total 18
drwxr-xr-x  8 root root 4096 Jan 13  2014 cernvm-prod.cern.ch
drwxr-xr-x 69  989  984 4096 Aug 29  2014 cms.cern.ch
drwxr-xr-x 14  989  984 4096 Dec 16  2015 cms-opendata-conddb.cern.ch
drwxr-xr-x  4  989  984 4096 May 28  2014 cvmfs-config.cern.ch

Finally, run the cms executable with our configuration (it may really take a while, but the next time you want to run it will be faster):

cmsRun Demo/DemoAnalyzer/demoanalyzer_cfg.py
14-Sep-2020 02:28:06 GMT  Initiating request to open file root://eospublic.cern.ch//eos/opendata/cms/Run2011A/ElectronHad/AOD/12Oct2013-v1/20001/001F9231-F141-E311-8F76-003048F00942.root
14-Sep-2020 02:28:13 GMT  Successfully opened file root://eospublic.cern.ch//eos/opendata/cms/Run2011A/ElectronHad/AOD/12Oct2013-v1/20001/001F9231-F141-E311-8F76-003048F00942.root
Begin processing the 1st record. Run 166782, Event 340184599, LumiSection 309 at 14-Sep-2020 02:28:26.283 GMT
Begin processing the 2nd record. Run 166782, Event 340185007, LumiSection 309 at 14-Sep-2020 02:28:26.284 GMT
Begin processing the 3rd record. Run 166782, Event 340187903, LumiSection 309 at 14-Sep-2020 02:28:26.285 GMT
Begin processing the 4th record. Run 166782, Event 340227487, LumiSection 309 at 14-Sep-2020 02:28:26.285 GMT
Begin processing the 5th record. Run 166782, Event 340210607, LumiSection 309 at 14-Sep-2020 02:28:26.285 GMT
Begin processing the 6th record. Run 166782, Event 340256207, LumiSection 309 at 14-Sep-2020 02:28:26.286 GMT
Begin processing the 7th record. Run 166782, Event 340165759, LumiSection 309 at 14-Sep-2020 02:28:26.286 GMT
Begin processing the 8th record. Run 166782, Event 340396487, LumiSection 309 at 14-Sep-2020 02:28:26.287 GMT
Begin processing the 9th record. Run 166782, Event 340390767, LumiSection 309 at 14-Sep-2020 02:28:26.287 GMT
Begin processing the 10th record. Run 166782, Event 340435263, LumiSection 309 at 14-Sep-2020 02:28:26.288 GMT
14-Sep-2020 02:28:26 GMT  Closed file root://eospublic.cern.ch//eos/opendata/cms/Run2011A/ElectronHad/AOD/12Oct2013-v1/20001/001F9231-F141-E311-8F76-003048F00942.root

=============================================

MessageLogger Summary

 type     category        sev    module        subroutine        count    total
 ---- -------------------- -- ---------------- ----------------  -----    -----
    1 fileAction           -s file_close                             1        1
    2 fileAction           -s file_open                              2        2

 type    category    Examples: run/evt        run/evt          run/evt
 ---- -------------------- ---------------- ---------------- ----------------
    1 fileAction           PostEndRun                        
    2 fileAction           pre-events       pre-events       

Severity    # Occurrences   Total Occurrences
--------    -------------   -----------------
System                  3                   3

Key Points

  • The CMS Docker image contains all the required ingredients to start analyzing CMS open data.

  • In order to test and validate the Docker container you can run a simple CMSSW job.