Content from Introduction
Last updated on 2024-07-08 | Edit this page
Estimated time: 5 minutes
Overview
Questions
- What is docker?
- What is the point of these exercises?
Objectives
- Learn about Docker and why we’re using it
What is Docker?
Let’s learn about Docker and why we’re using it!
Regardless of what you encounter in this lesson, the definitive guide is any official documentation provided by Docker.
From the Docker website
What is a container?
A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.
In short, Docker allows a user to work in a computing environment that is well defined and has been frozen with respect to interdependent libraries and code and related tools. The computing environment in the container is separate and independent from your own working area. This means that you do not have to worry about software packages that you might already have installed with a different version.
What can I learn here?
As much as we’d like, we can’t give you a complete overview of Docker. However, we do hope to explain why we run Docker in the way we do so that you gain some understanding. More specifically, we’ll be showing you how to set up Docker for not just this workshop, but for interfacing with the CMS open data in general
Are there any alternatives to Docker?
If you’re working on a remote cluster rather than a local computer,
one alternative you may want to try out is an apptainer
container. More info is in Episode
4: Apptainer for CMS open data.
Key Points
- Docker is a set of products to deliver and run software in packages called containers.
- Software containers are widely used these days in both industry and academic research.
- We use software containers during the hands-on sessions to provide the a well-defined software environment for exercises.
Content from Installing docker
Last updated on 2024-06-27 | Edit this page
Estimated time: 35 minutes
Overview
Questions
- How do you install Docker?
- What are the main Docker concepts and commands I need to know?
Objectives
- Install Docker
- Test the installation
- Learn and exercise the basic commands.
Installing docker
Go to the offical Docker site and their installation instructions to install Docker for your operating system.
For our purposes, we need docker-engine which is a free open-source software. Note that you can choose to install Docker Desktop (free for single, non-commercial use), but in our instructions, we do not rely on the Graphical UI that it offers.
Windows users:
In the episodes of this lesson that follow, we assume that Windows users have WSL2 activated with a Linux bash shell (e.g. Ubuntu). All commands indicated with “bash” are expected to be typed in this Linux shell (not in git bash or power shell).
Testing
As you walk through their documentation, you will eventually come to
a point where you will run a very simple test, usually involving their
hello-world
container.
You can find their documentation for this step here.
Testing their code can be summed up by the ability to run (without generating any errors) the following commands.
Important: Docker postinstall to avoid sudo for Linux installation
If you need sudo
to run the command above, make sure to
complete these
steps after installations, otherwise you will run into trouble with
shared file access later on. Guaranteed!!
In brief:
Then close the shell and open a new one. Verify that you can run
docker run hello-world
without sudo
.
Images and Containers
As it was mentioned above, there is ample documentation provided by Docker official sites. However, there are a couple of concepts that are crucial for the sake of using the container technology with CMS open data: container images and containers.
One can think of the container image as the main ingredients for preparing a dish, and the final dish as the container itself. You can prepare many dishes (containers) based on the same ingredients (container image). Images can exist without containers, whereas a container needs to run an image to exist. Therefore, containers are dependent on images and use them to construct a run-time environment and run an application.
The final dish, for us, is a container that can be thought of as an isolated machine (running on the host machine) with mostly its own operating system and the adequate software and run-time environment to process CMS open data.
Docker provides the ability to create, build and/or modify images, which can then be used to create containers. For the MC generator, ML learning and CMS open data lessons, we will use already-built and ready-to-use images in order to create our needed container, but we will exercise building images with some additional code later on during the Midsummer QCD school hands-on sessions.
Commands Cheatsheet
There are many Docker commands that can be executed for different tasks. However, the most useful for our purposes are the following. We will show some usage examples for some of these commands later. Feel free to explore other commands.
Create and start a container based on a specific image
This command will be used later to create our CMS open data container.
The option -v
for mounting a directory from the local
computer to the container will also be used so that you can edit files
on your normal editor and used them in the container:
Copy files in or out of a container
Key Points
- For up-to-date details for installing Docker, the official documentation is the best bet.
- Make sure you were able to download and run Docker’s
hello-world
example. - The concepts of image and container, plus the knowledge of certain Dockers commands, is all that is needed for the hands-on sessions.
Content from Docker containers for CMS open data
Last updated on 2024-07-09 | Edit this page
Estimated time: 35 minutes
Overview
Questions
- What container images are available for my work with the CMS open data?
Objectives
- Download the ROOT and python images and build your own container
- Restart an existing container
- Learn how to share a working area between your laptop and the container image.
Overview
This exercise will walk you through setting up and familiarizing yourself with Docker, so that you can effectively use it to interface with the CMS open data. It is not meant to completely cover containers and everything you can do with Docker.
For CMS open data work, three types of container images are provided: one with the CMS software (CMSSW) compatible with the released data, and two others with ROOT and python libraries needed in this workshop.
You will only need the CMSSW container, if you want to access the CMS data in the MiniAOD format (you will learn about them later). Access to it requires CMSSW software that you will not be able to install on your own computer.
During the workshop hands-on lessons, we will be using the CMS data the NanoAOD format and access to it does not require any CMS-specifc software. Two containers are provided to make setting up and using ROOT and/or python libraries easier for you for this tutorial, but if you wish, you can also install them on your computer.
All container images come with VNC for the graphical use interface. It opens directly in a browser window. Optionally, you can also connect to the VNC server of the container using a VNC viewer (VNC viewer (TigerVNC, RealVNC, TightVNC, OSX built-in VNC viewer, etc.) installed on your local machine, but only the browser option for which no additional tools are needed is described in these instructions. On native Linux, you can also use X11-forwarding.
For different container images, some guidance can be found on the Open Data Portal introduction to Docker. The use of graphical interfaces, such the graphics window from ROOT, depends on the operating system of your computer. Therefore, in the following, separate instructions are given for Windows WSL, Linux and MacOS.
Start the container
The first time you start a container, a docker image file gets downloaded from an image registry. The open data images are large (2-3GB for the root and python tools images and substantially larger for the CMSSW image) and it may take long to download, depending on the speed of your internet connection. After the download, a container created from that image starts. The image download needs to be done only once. Afterwards, when starting a container, it will find the downloaded image on your computer, and it will be much faster.
The containers do not have modern editors and it is expected that you mount your working directory from the local computer into the container, and use your normal editor for editing the files. Note that all your compiling and executing still has to be done in the Docker container!
First, before you start up your container, create a local directory
where you will be doing your code development. In the examples below, it
is called cms_open_data_python
and
cms_open_data_root
, respectively, and the variable will
record your working directory path. You may choose a different location
and a shorter directory name if you like.
Python tools container
Create the shared directory in your working area:
Start the container with
BASH
docker run -it --name my_python -P -p 8888:8888 -v ${workpath}/cms_open_data_python:/code gitlab-registry.cern.ch/cms-cloud/python-vnc:python3.10.5
You will get a container prompt similar this:
OUTPUT
cmsusr@4fa5ac484d6f:/code$
This is a bash shell in the container. If you had some files in the shared area, they would be available here.
You can now open a jupyter lab from the container prompt with
and open the link that is printed out in the message.
Link does not work?
Try replacing 127.0.0.1
in the link with
localhost
.
Click on the Jupyter notebook icon to open a new notebook.
Permission denied?
If a window with “Error Permission denied: Untitled.ipynb” pops up,
you most likely forgot to define the path variable, to create the shared
directory or to change its permissions. Exit the jupyter lab with
Control-C and confirm with y, then type exit
to stop the
container. Remove the container with
and now start all over from the start.
Close the jupyter lab with Control-C and confirm with y. Type
exit
to leave the container:
Root tools container
Create the shared directory in your working area:
Then start the container, depending on your host system:
BASH
docker run -it --name my_root -P -p 5901:5901 -p 6080:6080 -v ${workpath}/cms_open_data_root:/code gitlab-registry.cern.ch/cms-cloud/root-vnc:latest
You will get a container prompt similar this:
OUTPUT
cmsusr@9b182de87ffc:/code$
For graphics, use VNC that is installed in the container and start
the graphics windows with start_vnc
. Open the browser
window in the address given at the start message (http://127.0.0.1:6080/vnc.html) with the default VNC
password is cms.cern
. It shows an empty screen to start
with and all graphics will pop up there.
You can test it with ROOT:
Open a ROOT Object Browser by typing
TBrowser t
in the ROOT prompt.
You should see it opening in the VNC tab of your browser.
Exit ROOT with
.q
in the ROOT prompt.
Type exit
to leave the container, and if you have
started VNC, stop it first:
BASH
docker run -it --name my_root --net=host --env="DISPLAY" -v $HOME/.Xauthority:/home/cmsusr/.Xauthority:rw -v ${workpath}/cms_open_data_root:/code gitlab-registry.cern.ch/cms-cloud/root-vnc:latest
You will get a container prompt similar this:
OUTPUT
cmsusr@9b182de87ffc:/code$
For graphics, X11-forwarding to your host is used.
You can test it with ROOT:
Open a ROOT Object Browser by typing
TBrowser t
in the ROOT prompt. You should see the ROOT Object Broswer opening.
Exit ROOT with
.q
in the ROOT prompt.
Type exit
to leave the container:
If the X11 forwarding does not work
Exercises
Homework: confirm your containers work
Please visit the assignment form and answer a few questions about your container. You need to sign in and click on the submit button in order to save your work. You can go back to edit the form at any time.
Challenge 1
Create a file on your local host and make sure that you can see it in the container.
Open the editor in your host system, create a file
example.txt
and save it to the shared working area, either
in cms_open_data_python
or in
cms_open_data_root
.
Restart the container, making sure to choose the container that is connected to shared working area that you have chosen:
or
In the container prompt, list the files and show the contents of the newly created file:
Challenge 2
Make a plot with the jypyter notebook, save it to a file and make sure that you get it to your host
Restart the python container with
Start the jypyter lab with
Open a new jupyter notebook.
Make a plot of your choice. For a quick CMS open data plot, you can use the following:
PYTHON
import uproot
import matplotlib.pylab as plt
import awkward as ak
import numpy as np
# open a CMS open data file, we'll see later how to find them
file = uproot.open("root://eospublic.cern.ch//eos/opendata/cms/Run2016H/SingleMuon/NANOAOD/UL2016_MiniAODv2_NanoAODv9-v1/120000/61FC1E38-F75C-6B44-AD19-A9894155874E.root")
file.classnames() # this is just to see the top-level content
events = file['Events']
# take all muon pt values in an array, we'll see later how to find the variable names
pt = events['Muon_pt'].array()
plt.figure()
plt.hist(ak.flatten(pt),bins=200,range=(0,200))
plt.show()
plt.savefig("pt.png")
Run the code shell, and you should see the muon pt plot.
You can rename your notebook by right-clicking on the name in the left bar and choosing “Rename”.
In a shell on your host system, move to the working directory, list the files and you should see the notebook and the plot file.
OUTPUT
myplot.ipynb pt.png
Challenge 3
Make a plot with ROOT, save it to a file and make sure that you get it to your host
Restart the root container with
If you are using VNC, start it in the container prompt with
Open the URL in your browser and connect with the password
cms.cern
.
Start ROOT and make a plot of your choice.
For a quick CMS open data plot, you can open a CMS open data file with ROOT with
BASH
root root://eospublic.cern.ch//eos/opendata/cms/Run2016H/SingleMuon/NANOAOD/UL2016_MiniAODv2_NanoAODv9-v1/120000/61FC1E38-F75C-6B44-AD19-A9894155874E.root
In the ROOT prompt, type
TBrowser t
to open the ROOT object browser, which opens in your broswer VNC tab.
Double click on the file name, then on Events
and then a
variable of your choice, e.g. nMuon
You should see the plot. To save it, right click in the plot margins
and you will see a menu named TCanvas::Canvas_1
. Choose
“Save as” and give the name, e.g. nmuon.png
.
Quit ROOT by typing .q
on the ROOT prompt or choosing
“Quit Root” from the ROOT Object Browser menu options.
In a terminal on your host system, move to the working directory, list the files and you should see the plot file.
OUTPUT
nmuon.png
Key Points
- You have now set up a docker container as a working enviroment for CMS open data.
- You know how to pass files between your own computer and the container.
- You know how to open a graphical window of ROOT or a jupyterlab in your browser using software installed in the container.
Content from Apptainer for CMS open data
Last updated on 2024-07-09 | Edit this page
Estimated time: 35 minutes
Overview
Questions
- What apptainer images are available for my work with the CMS open data?
Objectives
- Download the ROOT and python images and launch the containers on a remote cluster
Prerequisites
pre-requisites
- Apptainer installed on your remote cluster
- A copy of the ROOT and Python container images: Downlowd the .sif files from here
Introduction
This is an optional section for you to try out as an alternative to
docker. This section will provide an overview of how to use
apptainer
images on a remote cluster. Apptainer is a
package that many clusters install to manage containerized software,
particularly if a batch job system is connected to the cluster. If your
cluster has Apptainer available, you can use our Open Data toolkits
without requesting additional software installation on the cluster
itself.
Although the images have been verified to work, additional work may need to be done on your side to tune your specific configurations/permissions to allow remote windows (e.g., a jupyter-lab browser or a ROOT TBrowser) to open. If you intend to use these images during the workshop, please make sure you download the images before the workshop. The images are between about 0.5 to 1 GB, and may take 30 minutes or more to download.
Python tools container
After you have the pre-requisite .sif images downloaded to your
system, copy them onto the remote cluster that you will use to analyze
Open Data, perhaps using scp
.
Start the Python container with:
You will get a container prompt similar to this:
OUTPUT
Singularity>
This is a bash shell in the container. You can now open jupyter lab from the container prompt by typing
The result should be a web link that you can enter into your browser to see your jupyter notebook. Click on the Jupyter notebook icon to open a new notebook.
Link does not work?
Try replacing 127.0.0.1
in the link with
localhost
.
If that change doesn’t work, you may need to modify your ssh file and remote cluster login command. Check the ssh config file on your local computer (not the remote cluster):
Add the following lines to your config file (you can create the file if it does not exist):
Host *
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
You can replace * with the name of your cluster. For example, your cluster address might be something like @Computing.Univ.Edu. If you do not know what to do, you can put * and it will apply to any remote connection. When you log into your cluster, prepend the option -L and provide ports for displays:
Next go back to the directory where you downloaded the python image and have opened the container. Type the following in the container:
The result should be a web link that you can enter into your browser to access your jupyter notebook. Click on the Jupyter notebook icon to open a new notebook.
Still no jupyter notebook?
Check to see if you have a jupyter config file. To create one, log in to the remote cluster and enter the apptainer shell. Then execute:
It will print out the path to a jupyter_lab_config.py
file, which may be in your home directory outside the apptainer shell in
.jupyter/jupyter_lab_config.py
. Change the line:
c.ServerApp.open_browser = False
to
c.ServerApp.open_browser = True
Then try again to open the juypter notebook from the jupyter lab webpage.
ROOT tools container
Download the ROOT image from the folder listed here and copy it onto your remote cluster.
Start the container with:
You will get a container prompt similar to this:
OUTPUT
Singularity>
If you type:
you will get a welcome message, and a root prompt that looks like
Unlike the docker container, start_vnc
should not be
necessary to view plots from ROOT. Test this by opening a ROOT
browser:
If you do see the browser GUI appear, reach out on the apptainer help mattermost channel. You can exit ROOT and the container by entering:
Exercises
Homework: confirm your containers work
Please visit the assignment form and answer a few questions about your container. You need to sign in and click on the submit button in order to save your work. You can go back to edit the form at any time.
Challenges from the previous episode
If you skipped the previous episode because you are working on a
remote cluster, go
back one page and try the exercises using apptainer. The main goal
of these exercises is to make sure you are able to execute important
commands inside the containers and access stored files. Recall that for
the ROOT exercise, start_vnc
and stop_vnc
should not be necessary. If you need help, contact us in Mattermost.
Key Points
- You have now set up apptainer containers to work with CMS Open Data.
- You know how to open a graphical window of ROOT or a jupyterlab in your browser using software installed in the container.