Introduction
Overview
Teaching: 5 min
Exercises: 0 minQuestions
What is Docker?
What is the point of these exercises?
Objectives
Learn about Docker and why we’re using it
Let’s learn about Docker and why we’re using it!
Regardless of what you encounter in this lesson, the definitive guide is any official documentation provided by Docker.
What is Docker?
From the Docker website
A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.
In short, Docker allows a user to work in a computing environment that has been frozen with respect to interdependent libraries and code and related tools. This means that you can use the same software that analysts were using 10 years ago (for example) without downloading all the relevant 10-year-old libraries. :)
What can I learn here?
As much as we’d like, we can’t give you a complete overview of Docker. However, we do hope to explain why we run Docker in the way we do so that you gain some understanding. More specifically, we’ll be showing you how to set up Docker for not just this workshop, but for interfacing with the CMS open data in general
Key Points
Docker is an implementation of a tool called a container that gives us a self-consistent computing environment
Docker is widely used these days in both industry and academic research
Docker is one way that you can interface with CMS data using the same computing tools as CMS collaborators
Installing Docker
Overview
Teaching: 5 min
Exercises: 15 minQuestions
What equipment do I need?
How do I install Docker?
Objectives
Install Docker on your machine
Installing Docker is relatively straightforward, particularly because of the excellent documentation they provide. Still you want to set aside some time to do it properly and test it out.
Installing
Go to the offical Docker site and their installation instructions to install Docker for your operating system.
We see no need to go beyond the documentation they provide so we leave it up to you to follow their installation procedure.
Testing
As you walk through their documentation, you will eventually come to a point where you will
run a very simple test, usually involving their hello-world
container.
You can find their documentation for this step here.
Testing their code can be summed up by the ability to run (without generating any errors) the following commands.
docker --version
docker run hello-world
Key Points
For up-to-date details for installing Docker, the official documentation is the best bet.
Make sure you were able to download and run Docker’s
hello-world
example.
Using Docker with the CMS open data
Overview
Teaching: Self-guided min
Exercises: 40 min minQuestions
How do I use docker to effectively interface with the CMS open data?
Objectives
Download (fetch) the correct docker image
Fire up docker in the most useful way for CMS Open Data analysis
Use docker in a persistent way
Copy data out of the docker environment
Access Github repositories from within a docker environment
Overview
This exercise will walk you through setting up and familiarizing yourself with Docker, so that you can effectively use it to interface with the CMS open data. It is not meant to completely cover containers and everything you can do with Docker, but reach out to the organizers using the dedicated Mattermost channel if we are missing something.
Using the proper image for CMS software
The first time you go to run Docker, the following command will fetch the docker image and
put you into a bash
shell in which you have access to a complete CMS software release that
is appropriate for interfacing with the 2011 and 2012 7 and 8 TeV datasets. It may take some time to
download the full image, even as long as 20-30 minutes, depending on the speed of your internet
connection.
This command and some extra guidance can also be found on the Open Data Portal introduction to Docker, however the following command differs in that it allows for X11 forwarding That means that if you run a program from within Docker that pops up any windows or graphics, like ROOT, they will show up.
Keep in mind, on some systems, the file/directory paths might be different, so reach out to the organizers through the dedicated Mattermost channel if you have issues.
docker run -it --name myopendataproject --net=host --env="DISPLAY" --volume="$HOME/.Xauthority:/home/cmsusr/.Xauthority:rw" cmsopendata/cmssw_5_3_32 /bin/bash
Setting up CMSSW_5_3_32
CMSSW should now be available.
[21:53:43] cmsusr@docker-desktop ~/CMSSW_5_3_32/src $
Possible issues on Windows
If the docker command exits without giving you this output on WSL2 (Windows), see this post in the CERN Open Data forum
It might be worth breaking down this command for the interested user. For a more complete
listing of options, see the official Docker documentation on the run
command.
To start a CMSSW container instance and open it in a bash shell, one would need only type
docker run -it cmsopendata/cmssw_5_3_32 /bin/bash
The -it
option means to start the instance in interactive mode
Adding the following assigns a name
to the instance so that we can refer back
to this environment and still access any files we created in there. You can, of course,
choose a different name than myopendataproject
! :)
... --name myopendataproject ...
Adding the following gives us X11-forwarding, though this will not work with Windows10 WSL2 Linux.
... --net=host --env="DISPLAY" --volume="$HOME/.Xauthority:/home/cmsusr/.Xauthority:rw" ...
When you’re done, you can just type exit
to leave the Docker environment.
Additional flags
Later on in this lesson we will show you two additional arguments to this command, both related to mounting local directories on your laptop/desktop such that it will be visible in the Docker container.
One example we will show you will walk you through creating a local working directory for your analysis code. This means that you can edit your scripts or files locally and exectute them in Docker. It will give you much greater flexibility in using whatever backup or version control you are comfortable with.
In a separate module we will show you how to mount the CERN-VM file system (CVMFS), giving you more access to CMS software and calibration information. CVMFS will be discussed in greater detail in that module
As these flags are discussed, we will modify this primary docker
command in those sections.
Stopping docker instances
As you are learning how to use Docker, you may find yourself with multiple instances running. Or maybe you started an instance with your favourite name with some set of flags and now you want to re-start that same instance but with new flags. In that case, you will want to stop and remove the running containers.
To stop all containers you would type the following on your local machine.
docker stop $(docker ps -aq)
To remove all containers, you would type the following on your local machine.
docker rm $(docker ps -aq)
Don’t worry!
Note that these commands will not remove the actual Docker files that you downloaded and may have taken quite some time to download! Whew!
Using Docker repeatedly
The next time you want to run Docker, it will not need to download any significant data so it should open in seconds. You could choose to run the same command as before and while that would work and quickly put you into a Docker environment, there are some issues with this. Most significantly, any files that you make or any code that you write in that environment will not be there! Instead of the above command, we want to run Docker in a persistent way so that we keep going into the same working area with all our files and code saved each time.
There are two ways to do this: by giving your container instance a name or by making sure you reference the container id. The former approach is probably easier and preferred, but we discuss both below.
Start docker by name
The easiest way to start a docker instance that you want to return to is using the --name
option, as shown in the first example. If you’ve named your instance similarly, you can
start
the instance, just by providing the name.
You will also use -i
for interactive rather than -it
. It will still come up as normal.
Note also that you do not need the full cmsopendata/cmssw_5_3_32
argument anymore.
So to re-start
your container, just do the following
and you will still have X11-forwarding (on Linux and Mac) and the mounted disk volumes, assuming you
ran the full command earlier.
docker start -i myopendataproject
Start/Attach to a particular process
If you did not name your container instance but still want to return to a very specifc
environment, you will need to start
and then attach
to the exact same Docker instance as before.
First of all, you want to see what other Docker processes we have running. To do this, run the following
command
docker ps -a
You’ll see a list of docker processes that may look something like the following (the exact output will vary from user to user).
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4f323c317b90 hello-world "/hello" 3 minutes ago Exited (0) 3 minutes ago modest_jang
7719a7d74190 cmsopendata/cmssw_5_3_32 "/opt/cms/entrypoint…" 9 minutes ago Exited (0) 2 minutes ago happy_greider
8939ade0bfac cmsopendata/cmssw_5_3_32 "/opt/cms/entrypoint…" 16 hours ago Exited (128) 16 hours ago hungry_bhaskara
e914cef3c45a cmsopendata/cmssw_5_3_32 "/opt/cms/entrypoint…" 6 days ago Exited (1) 9 minutes ago beautiful_tereshkova
b3a888c059f7 cmsopendata/cmssw_5_3_32 "/opt/cms/entrypoint…" 13 days ago Exited (0) 13 days ago affectionate_ardinghelli
You’ll want to attach using the CONTAINER ID
. In the above example, I know that I’ve been using the most
recent container instance for cmsopendata, 7719a7d74190
. So to reattach, I run the following line
which will start
and attach
all in one line. Note that you
would want to change the CONTAINER ID
for your particular case.
docker start -a 7719a7d74190
Voila! You should be back in the same container.
CHALLENGE! Test X11 forwarding
For Windows users, open a specific CMS open data container
docker run -it -P -p 5901:5901 -p 6080:6080 cmsopendata/cmssw_5_3_32_vnc:latest /bin/bash
. In the container, typestart_vnc
and choose a password. Open a browser window with the given URL (enter the password), and start ROOT. If the web browser doesn’t work for you, alternative:
- Go to https://bintray.com/tigervnc/stable/tigervnc/1.10.0 and download vncviewer64-1.10.0.exe
- Run vncviewer64-1.10.0.exe, enter vnc server name: 127.0.0.1:5901, click connect, enter password
For Mac and Linux users, open the CMS open data container with
docker start...
ordocker run...
as instructed above and open ROOT, simply by typingroot
on the command line.Do you see the ROOT splash screen pop up? If not, check that you followed all the instructions above correctly or contact the facilitators.
To exit the ROOT interpreter type
.q
.If you find that X11 forwarding is not working, try typing the following before going starting your Docker container.
xhost local:root
CHALLENGE! Test persistence
Go into the Docker environment and create a test file using some simple shell commands. Type the following exactly as you see it. It will dump some text into a file and then print the contents of the file to the screen
echo "I am still here" > test.tmp cat test.tmp
After you’ve done this, exit out of the container and try to attach to the same instance. If you did it correctly, you should be able to list the contents of the directory with
ls -l
and see your file from before! If not, check that you followed all the instructions above correctly or contact the facilitators.
Copy file(s) into or out of a container
Sometimes you will want to copy a file directly into or out of a container. Let’s start with copying a file out.
Suppose you have created your myopendataproject container and you did the challenge
question above to Test persistence. In your docker image, there should be a file now
called test.tmp
Run the following on your local machine and not in a docker environment.
It should copy the file out and onto your local machine where you can inspect it.
docker cp myopendataproject:/home/cmsusr/CMSSW_5_3_32/src/test.tmp .
If you want to copy a file into a container instance, it works the way you might expect.
Suppose you have a local file called localfile.tmp
. You can copy it into the same instance
as follows.
docker cp localfile.tmp myopendataproject:/home/cmsusr/CMSSW_5_3_32/src/
Mounting a local volume
Sometimes you may want to mount a filesystem from your local machine or some other remote system so that your docker image can see it. Let’s first see how this is done in a general way.
The basic usage is
docker run -v <path on host>:<path in container> <image>
Where the path on host
is the full path to the local file system/directory you want to
make visible to docker. The path in container
is where it will be mounted in your
Docker container.
There are more options and if you want to read more, please visit the official Docker documentation.
When working with the CMS open data, you will find yourself using this approach in at least two ways:
- Having a local working directory for all your editing/version control, etc.
- Mounting the CVMFS file system (next module).
Note that all your compiling and executing still has to be done in the Docker container! But having your source code also visible on your local laptop/desktop will make things easier for you.
Let’s try this. First, before you start up your Docker image, create a local directory
where you will be doing your code development. In the example below, I’m calling it
cms_open_data_work
and it will live in my $HOME
directory. You may choose a shorter directory name if you like. :)
Local machine
cd # This is to make sure I'm in my home directory mkdir cms_open_data_work
Then fire up your Docker container, adding the following
-v ${HOME}/cms_open_data_work:/home/cmsusr:shared
Your full docker
command would then look like this
Local machine
docker run -it --name myopendataproject --net=host --env="DISPLAY" --volume="$HOME/.Xauthority:/home/cmsusr/.Xauthority:rw" -v ${HOME}/cms_open_data_work:/home/cmsusr/cms_open_data_work:shared cmsopendata/cmssw_5_3_32 /bin/bash
Setting up CMSSW_5_3_32
CMSSW should now be available.
[21:53:43] cmsusr@docker-desktop ~/CMSSW_5_3_32/src $
When your Docker container starts up, it puts you in /home/cmsusr/CMSSW_5_3_32/src
, but your new mounted directory is /home/cmsusr/cms_open_data_work
.
The easiest thing to do is to create a soft link to that directory from inside /home/cmsusr/CMSSW_5_3_32/src
using ln -s ...
as shown below,
and then do your work in that directory.
Warning!
Sometimes the local volume is mounted in the Docker container as the wrong user/group. It should be
cmsusr
but sometimes is mounted ascmsinst
. Note that in the following set of commands, we add a line to change the user/group with thechown
command.If this is an issue, you’ll also need to do this in Docker for any new directories you check out on your local machine.
Docker container
cd /home/cmsusr/CMSSW_5_3_32/src sudo chown -R cmsusr.cmsusr ~/cms_open_data_work/ ln -s ~/cms_open_data_work/ cd /home/cmsusr/CMSSW_5_3_32/src/cms_open_data_work/
Now, open a new terminal on your local machine (or simply exit out of your container) and check out one of the repositories you’ll be working with. If you are not familiar with git/Github, check out the Git pre-exercises.
Local machine
cd ~/cms_open_data_work git clone https://github.com/katilp/AOD2NanoAODOutreachTool.git AOD2NanoAOD
Cloning into 'AOD2NanoAOD'...
remote: Enumerating objects: 60, done.
remote: Counting objects: 100% (60/60), done.
remote: Compressing objects: 100% (54/54), done.
remote: Total 343 (delta 29), reused 13 (delta 5), pack-reused 283
Receiving objects: 100% (343/343), 743.11 KiB | 461.00 KiB/s, done.
Resolving deltas: 100% (162/162), done.
Next, go back into your Docker container (either in your other window or by restarting that same container, and see if you can see this new directory that you checked out on your local machine.
Docker container
cd /home/cmsusr/CMSSW_5_3_32/src/cms_open_data_work ls -l
total 4
drwxr-xr-x 8 cmsinst cmsinst 4096 Sep 26 20:48 AOD2NanoAOD
Voila! You now have a workflow where you can edit files locally, using whatever tools are on your local machine, and then exectute them in the Docker environment.
Let’s try compiling and running this new code!
Note that to actually compile the code, we want to be in the
/home/cmsusr/CMSSW_5_3_32/src
directory.
Docker container
cd /home/cmsusr/CMSSW_5_3_32/src sudo chown -R cmsusr.cmsusr cms_open_data_work/AOD2NanoAOD/ scram b
Reading cached build data
>> Local Products Rules ..... started
>> Local Products Rules ..... done
>> Building CMSSW version CMSSW_5_3_32 ----
>> Entering Package cms_open_data_work/AOD2NanoAOD
>> Creating project symlinks
Entering library rule at cms_open_data_work/AOD2NanoAOD
>> Compiling edm plugin /home/cmsusr/CMSSW_5_3_32/src/cms_open_data_work/AOD2NanoAOD/src/AOD2NanoAOD.cc
>> Building edm plugin tmp/slc6_amd64_gcc472/src/cms_open_data_work/AOD2NanoAOD/src/cms_open_data_workAOD2NanoAOD/libcms_open_data_workAOD2NanoAOD.so
Leaving library rule at cms_open_data_work/AOD2NanoAOD
@@@@ Running edmWriteConfigs for cms_open_data_workAOD2NanoAOD
--- Registered EDM Plugin: cms_open_data_workAOD2NanoAOD
>> Leaving Package cms_open_data_work/AOD2NanoAOD
>> Package cms_open_data_work/AOD2NanoAOD built
>> Subsystem cms_open_data_work built
>> Local Products Rules ..... started
>> Local Products Rules ..... done
gmake[1]: Entering directory `/home/cmsusr/CMSSW_5_3_32'
>> Creating project symlinks
>> Done python_symlink
>> Compiling python modules cfipython/slc6_amd64_gcc472
>> Compiling python modules python
>> Compiling python modules src/cms_open_data_work/AOD2NanoAOD/python
>> All python modules compiled
@@@@ Refreshing Plugins:edmPluginRefresh
>> Pluging of all type refreshed.
>> Done generating edm plugin poisoned information
gmake[1]: Leaving directory `/home/cmsusr/CMSSW_5_3_32'
And now we can run it! The following command may take anywhere from 10-20 minutes to run.
Docker container
cd /home/cmsusr/CMSSW_5_3_32/src/cms_open_data_work/AOD2NanoAOD/ cmsRun configs/data_cfg.py
200926 22:12:20 802 secgsi_InitProxy: cannot access private key file: /home/cmsusr/.globus/userkey.pem
26-Sep-2020 22:46:14 CEST Initiating request to open file root://eospublic.cern.ch//eos/opendata/cms/Run2012B/TauPlusX/AOD/22Jan2013-v1/20000/0040CF04-8E74-E211-AD0C-00266CFFA344.root
26-Sep-2020 22:46:17 CEST Successfully opened file root://eospublic.cern.ch//eos/opendata/cms/Run2012B/TauPlusX/AOD/22Jan2013-v1/20000/0040CF04-8E74-E211-AD0C-00266CFFA344.root
26-Sep-2020 22:51:14 CEST Closed file root://eospublic.cern.ch//eos/opendata/cms/Run2012B/TauPlusX/AOD/22Jan2013-v1/20000/0040CF04-8E74-E211-AD0C-00266CFFA344.root
=============================================
MessageLogger Summary
type category sev module subroutine count total
---- -------------------- -- ---------------- ---------------- ----- -----
1 fileAction -s file_close 1 1
2 fileAction -s file_open 2 2
type category Examples: run/evt run/evt run/evt
---- -------------------- ---------------- ---------------- ----------------
1 fileAction PostEndRun
2 fileAction pre-events pre-events
Severity # Occurrences Total Occurrences
-------- ------------- -----------------
System 3 3
Key Points
Docker is easy to use but there are number of options you have to be careful with in order to use it effectively with the CMS open data
Setting up CVMFS
Overview
Teaching: 15 min
Exercises: 30 minQuestions
How do I access some CMS-specific software
Objectives
Install the CVMFS on either your local machine or inside the Docker container
Use CVMFS to access and run tools used to calculate the luminosity for specific run periods
At some point you may want to calculate the luminosity or study trigger
effects for the data you are analyzing. CMS uses a tool called brilcalc
but it is not included in the Docker image.
To get around this, we mount drives at CERN in the Docker container when you fire it up so that you can call this and perhaps other tools.
To learn more about brilcalc
you can read the
CERN Open Data Portal documentation.
This lesson however is just to help you test that you can access this tool.
Installing CVMSFS
There are two ways to install CVMFS:
- Install CVMFS locally and then mount in the Docker container
- Install CMVFS directly in the Docker container
Installing CVMFS locally
From the CVMFS documentation
The CernVM-File System (CernVM-FS) provides a scalable, reliable and low- maintenance software distribution service. It was developed to assist High Energy Physics (HEP) collaborations to deploy software on the worldwide- distributed computing infrastructure used to run data processing applications. CernVM-FS is implemented as a POSIX read-only file system in user space (a FUSE module). Files and directories are hosted on standard web servers and mounted in the universal namespace /cvmfs
.
In the following sections, we’ll direct you to the appropriate pages to download and install CVMFS locally. This should be done on your local machine and not in the container.
Let’s walk through these steps.
Get CVMFS
Follow the installation instructions to download the software for your particular OS.
Setup CVMFS
Next, you’ll want to set things up on your local machine. Follow these instructions carefully for your system.
One part of the setup which can be confusing is the content of the file /etc/cvmfs/default.local
.
The following lines should work, if they are the sole content of the file.
CVMFS_REPOSITORIES=cms.cern.ch,cms-opendata-conddb.cern.ch,cms-bril.cern.ch
CVMFS_HTTP_PROXY=DIRECT
CVMFS_CLIENT_PROFILE=single
Check the installation
Make sure you verify the installation. Check that link for the latest commands, but usually this involves running
sudo cvmfs_config setup
and then
cvmfs_config probe
or
sudo systemctl restart autofs
Possible issues
For some systems, you may run into some issues.
On WSL2 Ubuntu, after the installation, on each session, one has to run the following
sudo /usr/sbin/automount --pid-file /var/run/autofs.pid cvmfs_config probe
You may even find that even during a session, you need to re-run
cvmfs_config probe
on your host machine, even on Linux or Mac.
Set up your Docker container
In the previous instructions on starting up your Docker container, we included the commands to start your container with CVMFS mounted.
... --volume "/cvmfs:/cvmfs:shared" ...
This makes sure the container can see this CVMFS file system.
If you ran all the correct commands in the previous exercise, you can now start the container by name
docker start -i myopendataproject
Or, if you wanted to start a brand new, fresh container with the CVMFS file system mounted, you can run
docker run -it --name mycvmfs --volume "/cvmfs:/cvmfs:shared" cmsopendata/cmssw_5_3_32 /bin/bash
If you want to simply build upon everything you have done already, your full Docker command would now look like
docker run -it --name myopendataproject --net=host --env="DISPLAY" --volume="$HOME/.Xauthority:/home/cmsusr/.Xauthority:rw" -v ${HOME}/cms_open_data_work:/home/cmsusr/cms_open_data_work:shared --volume "/cvmfs:/cvmfs:shared" cmsopendata/cmssw_5_3_32 /bin/bash
Install CVMFS directly in the Docker container
We’ll be following the offical CVMFS documentation here and here but with specific instructions for the CMSSW Docker image.
You’ll want to launch Docker with a new --privileged
flag that will make it easier to install
new packages. If we are using the full command from the previous module, it would now look like
this
docker run --privileged -it --name myopendataproject --net=host --env="DISPLAY" --volume="$HOME/.Xauthority:/home/cmsusr/.Xauthority:rw" -v ${HOME}/cms_open_data_work:/home/cmsusr/cms_open_data_work:shared cmsopendata/cmssw_5_3_32 /bin/bash
The following command are all done in the Docker container.
Install the necessary packages using yum
. This could take up to 30 minutes to install these packages.
At some point, the installation process will prompt you for your approval,
Is this ok [y/N]:
. You can enter y
.
sudo yum install https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest.noarch.rpm
sudo yum install -y cvmfs
Loaded plugins: changelog, kernel-module, ovl, protectbase, tsflags, versionlock
Setting up Install Process
cvmfs-release-latest.noarch.rpm | 5.5 kB 00:00
Examining /var/tmp/yum-root-NokpI5/cvmfs-release-latest.noarch.rpm: cvmfs-release-2-6.noarch
Marking /var/tmp/yum-root-NokpI5/cvmfs-release-latest.noarch.rpm to be installed
EGI-trustanchors | 2.5 kB 00:00
EGI-trustanchors/primary_db | 56 kB 00:00
.
<more output follows>
.
Next you’ll need to configure autofs
, which handles mounting of filesystems.
sudo cvmfs_config setup
Edit the /etc/cvmfs/default.local
file
sudo vi /etc/cvmfs/default.local
and add these lines:
CVMFS_REPOSITORIES=cms.cern.ch,cms-opendata-conddb.cern.ch,cms-bril.cern.ch
CVMFS_HTTP_PROXY=DIRECT
CVMFS_CLIENT_PROFILE=single
Restart autofs
sudo service autofs restart
Verify the file system
cvmfs_config probe
Heads-up!
You will need to repeat the last two commands every time you restart the container.
Test it out and run brilcalc
Once you are in the container
and are in the CMSSW 5.3.32 environment, and have CVMFS working through either of
the above methods, you can then run the following commands to set some local environment
variables and then install brilcalc
using the python pip
command.
export PATH=$HOME/.local/bin:/cvmfs/cms-bril.cern.ch/brilconda/bin:$PATH
pip install --user brilws
Each time you login, you will have to re-run that export
command, even if you have
already installed brilws
in the container.
If everything worked, you should be able to run brilcalc
to check its version and
to get the luminosity for a sample run.
Note that the first time you run the brilcalc
commands, it can take up to 7 minutes to run!
brilcalc --version
brilcalc lumi -c web -r 160431
It should be noted that during the workshop, we will have an entire exercise dedicated to using this tool to calculate the luminosity for your datasets.
Key Points
Installing CVMFS can make some parts of the analysis much easier
Care must be given though to setting up your environment properly.
Test and validate
Overview
Teaching: 10 min
Exercises: 30 minQuestions
What is in the CMS Docker image?
How do I test and validate my Docker container?
Objectives
Learn about the details of the CMS Docker container
Test and validate the CMS Docker image by running a CMSSW job.
Helpline
Remember that we are always available to help. Our Mattermost channel is open.
Know your Docker image
The Docker container we just installed provides CMS computing environment to be used with the 2011 and 2012 CMS open data. The Docker container uses Scientific Linux CERN. As it was mentioned before, it comes equipped with the ROOT framework and CMSSW.
An important feature of the image is the availability of the CernVM File System.
Thanks to the cvmfs client installed, the Docker instance gets the CMS software (CMSSW)
from the shared /cvmfs/cms.cern.ch
area (physically at CERN but mounted locally)
and the jobs, running on the CMS open data Docker image, read the conditions data
from /cvmfs/cms-opendata-conddb.cern.ch
. Access to the data is through XRootD.
Run a simple demo for testing and validating
The validation procedure tests that the CMS environment is installed and operational on your Docker container, and that you have access to the CMS Open Data files. It also access the conditions data from the shared cvmfs area and caches them. This last action will save us time during the workshop. These steps also give you a quick introduction to the CMS environment.
Run the following command to create the CMS runtime variables:
cmsenv
Create a working directory for the demo analyzer, change to that directory and create a skeleton for the analyzer:
mkdir Demo
cd Demo
mkedanlzr DemoAnalyzer
Come back to the main src
area:
cd ../
Compile the code:
scram b
You can safely ignore the warning.
IMPORTANT NOTE: Depending on your system, there could be some issues with the shared clipboard between the host machine and the Docker container.
This means that it is possible that you cannot copy the instrucitons in this episode directly into your Docker session.One thing you can try is
Shift+Ctrl+V
when pasting into your Docker terminal, rather thanCtrl-V
. That sometimes will work.The quickest workaround might be using
ssh
and/orscp
commands to copy the required files to some other machine that you have access to, from the Docker container as well as from the host machine. For instance, if you had access to anlxplus
computer at cern, you could copy a certain file from the Docker container to the lxplus computer. On the Docker container you could do:scp myfile.txt myusername@lxplus.cern.ch:.
to copy a hypothetical file
myfile.txt
to lxplus, and then on the hostscp myusername@lxplus.cern.ch:myfile.txt .
to copy the same file back to your host machine. Then you can edit the file locally and reverse the process to get it back to your Docker container.
It could also be possible to have direct access from the host to the Docker container. This youtube tutorial might be of help for that option.
Before launching the job, let’s modify the configuration file (do not worry, you will learn about all this stuff in a different lesson) so it is able access a CMS open data file and cache the conditions data. As it was mentioned, this will save us time later.
Open the demoanalyzer_cfg.py
file using the vi
editor (here you can find a good cheatsheet for that editor).
vi Demo/DemoAnalyzer/demoanalyzer_cfg.py
Replace file:myfile.root
with root://eospublic.cern.ch//eos/opendata/cms/Run2011A/ElectronHad/AOD/12Oct2013-v1/20001/001F9231-F141-E311-8F76-003048F00942.root
to point to an example file.
Chage also the maximum number of events to 10. I.e., change -1
to 10
in process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(-1))
.
In addition, insert, below the PoolSource module, the following lines:
#needed to cache the conditions data
process.load('Configuration.StandardSequences.FrontierConditions_GlobalTag_cff')
process.GlobalTag.connect = cms.string('sqlite_file:/cvmfs/cms-opendata-conddb.cern.ch/FT_53_LV5_AN1_RUNA.db')
process.GlobalTag.globaltag = 'FT_53_LV5_AN1::All'
Take a look at the final validation config file
At the end, the config file should look like
import FWCore.ParameterSet.Config as cms process = cms.Process("Demo") process.load("FWCore.MessageService.MessageLogger_cfi") process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(10) ) process.source = cms.Source("PoolSource", # replace 'myfile.root' with the source file you want to use fileNames = cms.untracked.vstring( 'root://eospublic.cern.ch//eos/opendata/cms/Run2011A/ElectronHad/AOD/12Oct2013-v1/20001/001F9231-F141-E311-8F76-003048F00942.root' ) ) #needed to cache the conditions data process.load('Configuration.StandardSequences.FrontierConditions_GlobalTag_cff') process.GlobalTag.connect = cms.string('sqlite_file:/cvmfs/cms-opendata-conddb.cern.ch/FT_53_LV5_AN1_RUNA.db') process.GlobalTag.globaltag = 'FT_53_LV5_AN1::All' process.demo = cms.EDAnalyzer('DemoAnalyzer' ) process.p = cms.Path(process.demo)
Make symbolic links to the conditions database files from cvmfs:
ln -sf /cvmfs/cms-opendata-conddb.cern.ch/FT_53_LV5_AN1_RUNA FT_53_LV5_AN1
ln -sf /cvmfs/cms-opendata-conddb.cern.ch/FT_53_LV5_AN1_RUNA.db FT_53_LV5_AN1_RUNA.db
and make sure the cms-opendata-conddb.cern.ch directory has actually expanded in your Docker instance. One way of doing this is executing:
ls -l /cvmfs/
total 18
drwxr-xr-x 8 root root 4096 Jan 13 2014 cernvm-prod.cern.ch
drwxr-xr-x 69 989 984 4096 Aug 29 2014 cms.cern.ch
drwxr-xr-x 14 989 984 4096 Dec 16 2015 cms-opendata-conddb.cern.ch
drwxr-xr-x 4 989 984 4096 May 28 2014 cvmfs-config.cern.ch
Finally, run the cms executable with our configuration (it may really take a while, but the next time you want to run it will be faster):
cmsRun Demo/DemoAnalyzer/demoanalyzer_cfg.py
14-Sep-2020 02:28:06 GMT Initiating request to open file root://eospublic.cern.ch//eos/opendata/cms/Run2011A/ElectronHad/AOD/12Oct2013-v1/20001/001F9231-F141-E311-8F76-003048F00942.root
14-Sep-2020 02:28:13 GMT Successfully opened file root://eospublic.cern.ch//eos/opendata/cms/Run2011A/ElectronHad/AOD/12Oct2013-v1/20001/001F9231-F141-E311-8F76-003048F00942.root
Begin processing the 1st record. Run 166782, Event 340184599, LumiSection 309 at 14-Sep-2020 02:28:26.283 GMT
Begin processing the 2nd record. Run 166782, Event 340185007, LumiSection 309 at 14-Sep-2020 02:28:26.284 GMT
Begin processing the 3rd record. Run 166782, Event 340187903, LumiSection 309 at 14-Sep-2020 02:28:26.285 GMT
Begin processing the 4th record. Run 166782, Event 340227487, LumiSection 309 at 14-Sep-2020 02:28:26.285 GMT
Begin processing the 5th record. Run 166782, Event 340210607, LumiSection 309 at 14-Sep-2020 02:28:26.285 GMT
Begin processing the 6th record. Run 166782, Event 340256207, LumiSection 309 at 14-Sep-2020 02:28:26.286 GMT
Begin processing the 7th record. Run 166782, Event 340165759, LumiSection 309 at 14-Sep-2020 02:28:26.286 GMT
Begin processing the 8th record. Run 166782, Event 340396487, LumiSection 309 at 14-Sep-2020 02:28:26.287 GMT
Begin processing the 9th record. Run 166782, Event 340390767, LumiSection 309 at 14-Sep-2020 02:28:26.287 GMT
Begin processing the 10th record. Run 166782, Event 340435263, LumiSection 309 at 14-Sep-2020 02:28:26.288 GMT
14-Sep-2020 02:28:26 GMT Closed file root://eospublic.cern.ch//eos/opendata/cms/Run2011A/ElectronHad/AOD/12Oct2013-v1/20001/001F9231-F141-E311-8F76-003048F00942.root
=============================================
MessageLogger Summary
type category sev module subroutine count total
---- -------------------- -- ---------------- ---------------- ----- -----
1 fileAction -s file_close 1 1
2 fileAction -s file_open 2 2
type category Examples: run/evt run/evt run/evt
---- -------------------- ---------------- ---------------- ----------------
1 fileAction PostEndRun
2 fileAction pre-events pre-events
Severity # Occurrences Total Occurrences
-------- ------------- -----------------
System 3 3
Key Points
The CMS Docker image contains all the required ingredients to start analyzing CMS open data.
In order to test and validate the Docker container you can run a simple CMSSW job.