Virtual Machine

Introduction

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • What are virtual machines?

  • What are virtual machine images?

  • Why are they needed for studying CMS open data?

Objectives
  • Acquire a working knowledge of what a virtual machine is.

  • Differentiate between a hypervisor and virtual machine images

  • Understand the advantages and disadvantages of using virtual machines.

Helpline

Remember that we are always available to help. Our Mattermost channel is open.

The need for VMs

CMS open data and legacy data, even though still exciting and full of research potential, are already a few years old. Because of the rapidly evolving technologies, the computing environments that were used to analyze these data are already ancient compared to the current, bleeding edge ones.

Therefore, in order to maintain our ability to study these data, we have to rely on technologies that help us preserve adequate computer environments. One way of doing this is by using virtual machines.

In simple words, a virtual machine is an emulation of a computer system that can run within another system. The latter is usually known as the host.

Open data releases, CMSSW versions and operating systems

CMS open data from our 2010 release can be studied using CMSSW_4_2_8, a version of the CMSSW software that used to run under Scientific Linux CERN 5 (slc5) operating system. Likewise, open data from our 2011/2012 release used CMSSW_5_3_32 under Scientific Linux CERN 6 (slc6).

The virtual machines that are used to analyze these data, therefore, need to consider all these compatibility subtleties. They have to offer the correct operating system so the right version of CMSSW can be used to manipulate the dataset of interest.

Virtual machine images

In practical terms, a virtual machine image is a computer file that has all the right ingredients to create a virtual computer inside a given host by the process of virtualization. This file, however, needs to be decoded and run by a virtual machine interpreter, usually known as hypervisor, which runs on the host machine. The hypervisor we are going to use in this lesson is the Oracle’s VirtualBox.

CMS VM images

The most current images for CMS open data usage are described separately in the CERN Open Portal site for 2010 and 2011/2012. In later episodes we will be using the latter image file.

Advantages and disadvantages of using VMs

For the end user, VMs really look and feel like a physical machine, so working with them is easy and very convenient. The CMS VMs, in particular, come equipped with the ROOT framework (although with an older version), CMSSW and CVMFS access, all the goodies you need to start analyzing data right away.

However, VMs are usually heavy and take up a lot of the host resources, which can affect performance. Also, it is more difficult to run batch jobs (large scale processing) with them.

Key Points

  • A virtual machine is an emulation of a computer system that can run within another host system.

  • A hypervisor is a program that creates virtual machines from virtual machine image files and runs them virtually.

  • Virtual machines can be used to run over CMS open data in a very intuitively way. However, they take up a lot of resources, are less efficient than real machines, and harder to use for batch processing.


Hiring a hypervisor

Overview

Teaching: 5 min
Exercises: 10 min
Questions
  • How do I install VirtualBox?

Objectives
  • Download and install VirtualBox, the hypervisor for CMS VMs.

Helpline

Remember that we are always available to help. Our Mattermost channel is open.

Downloading VirtualBox

VirtualBox is a free, open source and multiplatform hypervisor application to run virtual machines. The latest version is 6.1.22 and is available for most platforms at the VirtualBox download site. Note that you will need administrative permissions on your machine in order to install it.

As described here, the latest tested version of VirtualBox working with this CMS-specific CernVM image is 6.1.10. Pick that one if you have troubles with the latest version. The full history of VirtualBox versions is available on a different page.

Installing VirtualBox

Find your way to install VirtualBox!

Now go ahead with the installation! After you download the appropriate installer file, proceed as you would with any other program in your system. It shouldn’t take long. Afterwards, you can open the program, just to test it has been installed correctly.

Meet your hypervisor

If installed correctly, after opening the program a window similar to this one should emerge:

Key Points

  • To install VirtualBox in your machine (the host), follow the installation instructions and run them in your device.


Recruiting the labor force

Overview

Teaching: 0 min
Exercises: 35 min
Questions
  • How do I install the CMS-specific CernVM image?

Objectives
  • Download and import the CMS CernVM image under VirtualBox .

Helpline

Remember that we are always available to help. Our Mattermost channel is open.

Downloading the CMS-specific CernVM image

Download the 2011 CMS-specific CernVM image as OVA file from the official record in the Cern Open Portal site. We recommend using version 1.5.3, i.e., CMS-OpenData-1.5.3. This VM Image can be used for data from 2011 and 2012, which are the data we will use in this workshop.

Make sure to download version of the ova file, 1.5.3. Other versions are deprecated.

Importing the image into VirtualBox

By double clicking the downloaded file, VirtualBox imports the image with ready-to-run settings. The import could also be triggered directly from the VirtualBox interface through the File > Import Appliance menu option. But before that,

make sure you unselect import hard drives as VDI on the initial import screen. Otherwise, it is very likely that the VM won’t boot.

Launching the CMS VM

After importing, it is likely that you will have to make at least one last change to the default settings before launching:

Change the Network settings for adapter 2 from Host-only Adapter to NAT.

Also, check for any Invalid settings detected at the bottom of the window after you click on the Settings button, and make the appropriate changes following those recommendations.

Finally, launch the CMS-specific CernVM, by clicking on it in the list of VMs and then on the Start button. It will boot into the graphical user interface and sets up the CMS environment. Be patient, the first time it will take a while.

Note: If you encounter any problems with booting the VM check the issues and limitations section and/or the troubleshooting guide.

If installed correctly, after the import procedure is complete, you will see something like these screenshots.

Key Points

  • To install the 2011 CMS VM, download the appropriate VM image (version 1.5.3), import it into VirtualBox and launch it.


Test and validate

Overview

Teaching: 10 min
Exercises: 30 min
Questions
  • What is in the CMS VM?

  • How do I test and validate my virtual machine?

Objectives
  • Learn about the details of the CMS virtual machine

  • Test and validate the CMS VM by running a CMSSW job.

Helpline

Remember that we are always available to help. Our Mattermost channel is open.

Know your VM

The virtual machine we just installed provides CMS computing environment to be used with the 2011 and 2012 CMS open data. The virtual machine is based on the CernVM and uses Scientific Linux CERN. As it was mentioned before, it comes equipped with the ROOT framework and CMSSW.

An important feature of the VM is the availability of the CernVM File System. Thanks to the cvmfs client installed, the VM gets the CMS software (CMSSW) from the shared /cvmfs/cms.cern.ch area (physically at CERN but mounted locally) and the jobs, running on the CMS open data VM, read the conditions data from /cvmfs/cms-opendata-conddb.cern.ch. Certain kind of information, like the trigger prescales, needs access to these conditions data. Access to the data itself uses the XRootD protocol.

The VM has a 40G virtual hard disk and a 20G cvmfs cache, which is large enough for condition data for full event range for 2012 data (see the CMS guide to the condition database for further details). It has an embedded Scientific Linux CERN 6 (slc6) shell, where all CMS software specific commands should be executed. Additionally, it has a CERN Scientific Linux CERN 7 (slc7) shell, which can be used in the same session.

Run a simple demo for testing and validating

The validation procedure tests that the CMS environment is installed and operational on your virtual machine, and that you have access to the CMS Open Data files. It also access the conditions data from the shared cvmfs area and caches them. This last action will save us time during the workshop. These steps also give you a quick introduction to the CMS environment.

In the VM, open a terminal from the CMS Shell icon from the desktop (note that the X terminal emulator from an icon bottom-left of the VM screen opens a shell with an operating system incompatible with the CMS software release to be used).

IMPORTANT NOTE: Depending on your system, there could be some issues with the shared clipboard between the host machine and the virtual machine. This means that it is possible that you cannot copy the instructions in this episode directly into your VM session. The quickest workaround is to use the text dump file of the lesson. You can download this file directly in your VM using, e.g., the wget command, and follow along to copy the necessary commands directly from the text file.

Alternative solution to the clipboard problem

Another possibility is to use the ssh and/or scp commands to copy the required files to some other machine that you have access to, from the VM as well as from the host machine. For instance, if you had access to an lxplus computer at cern (or any other machine at your institute, for instance), you could copy a certain file from the VM to the lxplus computer. On the VM you could do:

scp myfile.txt myusername@lxplus.cern.ch:.

to copy a hypothetical file myfile.txt to lxplus, and then on the host

scp myusername@lxplus.cern.ch:myfile.txt .

to copy the same file back to your host machine. Then you can edit the file locally and reverse the process to get it back to your VM.

It could also be possible to have direct access from the host to the VM. This youtube tutorial might be of help for that option.

Execute the following command; this command builds the local release area (the directory structure) for CMSSW, and only needs to be run once (note that it may take a while):

cmsrel CMSSW_5_3_32

Note that if you get a warning message about the current OS being SLC7, you are using a wrong terminal. Open a “CMS Shell” terminal as explained above and execute the cmsrel command there.

Change to the CMSSW_5_3_32/src/ directory:

cd CMSSW_5_3_32/src/

Then, run the following command to create the CMS runtime variables:

cmsenv

Work assignment

This is a good moment to go to our assignment form and answer some simple questions for this pre-exercise; you must sign in and click on the submit button in order to save your work. You can come back to edit the form at any time.

Create a working directory for the demo analyzer, change to that directory and create a skeleton for the analyzer:

mkdir Demo
cd Demo
mkedanlzr DemoAnalyzer

Go back to the main src area and compile the code:

cd ..
scram b

You can safely ignore the warning.

Before launching the job, let’s modify the configuration file (do not worry, you will learn about all this stuff in a different lesson) so it is able to access a CMS open data file and cache the conditions data. As it was mentioned, this will save us time later.

Open the demoanalyzer_cfg.py file using the nano editor. Note that other editors like emacs or vi are also avilable in the VM.

nano Demo/DemoAnalyzer/demoanalyzer_cfg.py

Replace 'file:myfile.root' with 'root://eospublic.cern.ch//eos/opendata/cms/Run2012B/DoubleMuParked/AOD/22Jan2013-v1/10000/1EC938EF-ABEC-E211-94E0-90E6BA442F24.root' to point to an example file. Note that the access to this file will be done using the XRootD protocol mentioned above.

Change also the maximum number of events to 10. I.e., change -1to 10 in process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(-1)).

In addition, insert, below the PoolSource module, the following lines:

#needed to cache the conditions data
process.load('Configuration.StandardSequences.FrontierConditions_GlobalTag_cff')
process.GlobalTag.connect = cms.string('sqlite_file:/cvmfs/cms-opendata-conddb.cern.ch/FT53_V21A_AN6_FULL.db')
process.GlobalTag.globaltag = 'FT53_V21A_AN6::All'

The lines above are intended to access to the condition data, such as the jet-energy corrections, trigger information, etc. You will learn about them in a later lesson. Right now it is sufficient to mention that it is a good idea to cache these data already so later in the workshop we can speed up the processing.

Take a look at the final validation config file

At the end, the config file should look like

import FWCore.ParameterSet.Config as cms
process = cms.Process("Demo")
process.load("FWCore.MessageService.MessageLogger_cfi")
process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(10) )
process.source = cms.Source("PoolSource",
# replace 'myfile.root' with the source file you want to use
   fileNames = cms.untracked.vstring(
       'root://eospublic.cern.ch//eos/opendata/cms/Run2012B/DoubleMuParked/AOD/22Jan2013-v1/10000/1EC938EF-ABEC-E211-94E0-90E6BA442F24.root'
   )
)
#needed to cache the conditions data
process.load('Configuration.StandardSequences.FrontierConditions_GlobalTag_cff')
process.GlobalTag.connect = cms.string('sqlite_file:/cvmfs/cms-opendata-conddb.cern.ch/FT53_V21A_AN6_FULL.db')
process.GlobalTag.globaltag = 'FT53_V21A_AN6::All'

process.demo = cms.EDAnalyzer('DemoAnalyzer'
)

process.p = cms.Path(process.demo)

Make symbolic links to the conditions database files from cvmfs:

ln -sf /cvmfs/cms-opendata-conddb.cern.ch/FT53_V21A_AN6_FULL FT53_V21A_AN6
ln -sf /cvmfs/cms-opendata-conddb.cern.ch/FT53_V21A_AN6_FULL.db FT53_V21A_AN6_FULL.db
ln -sf /cvmfs/cms-opendata-conddb.cern.ch/FT53_V21A_AN6_FULL FT53_V21A_AN6_FULL

and make sure the cms-opendata-conddb.cern.ch directory has actually expanded in your VM. One way of doing this is executing:

ls -l /cvmfs/
total 18
drwxr-xr-x  8 root root 4096 Jan 13  2014 cernvm-prod.cern.ch
drwxr-xr-x 69  989  984 4096 Aug 29  2014 cms.cern.ch
drwxr-xr-x 14  989  984 4096 Dec 16  2015 cms-opendata-conddb.cern.ch
drwxr-xr-x  4  989  984 4096 May 28  2014 cvmfs-config.cern.ch

Finally, run the cms executable with our configuration (it may really take a while, but the next time you want to run it will be faster):

cmsRun Demo/DemoAnalyzer/demoanalyzer_cfg.py
%MSG-w LocalFileSystem::initFSList():  (NoModuleName) 01-Jul-2021 03:49:49 GMT  pre-events
Cannot read '/etc/mtab': Invalid argument (error 22)
%MSG
01-Jul-2021 07:49:37 GMT  Initiating request to open file root://eospublic.cern.ch//eos/opendata/cms/Run2012B/DoubleMuParked/AOD/22Jan2013-v1/10000/1EC938EF-ABEC-E211-94E0-90E6BA442F24.root
01-Jul-2021 07:49:45 GMT  Successfully opened file root://eospublic.cern.ch//eos/opendata/cms/Run2012B/DoubleMuParked/AOD/22Jan2013-v1/10000/1EC938EF-ABEC-E211-94E0-90E6BA442F24.root
Begin processing the 1st record. Run 195013, Event 24425389, LumiSection 66 at 01-Jul-2021 07:50:01.476 GMT
Begin processing the 2nd record. Run 195013, Event 24546773, LumiSection 66 at 01-Jul-2021 07:50:01.477 GMT
Begin processing the 3rd record. Run 195013, Event 24679037, LumiSection 66 at 01-Jul-2021 07:50:01.478 GMT
Begin processing the 4th record. Run 195013, Event 24839453, LumiSection 66 at 01-Jul-2021 07:50:01.478 GMT
Begin processing the 5th record. Run 195013, Event 24894477, LumiSection 66 at 01-Jul-2021 07:50:01.478 GMT
Begin processing the 6th record. Run 195013, Event 24980717, LumiSection 66 at 01-Jul-2021 07:50:01.479 GMT
Begin processing the 7th record. Run 195013, Event 25112869, LumiSection 66 at 01-Jul-2021 07:50:01.479 GMT
Begin processing the 8th record. Run 195013, Event 25484261, LumiSection 66 at 01-Jul-2021 07:50:01.480 GMT
Begin processing the 9th record. Run 195013, Event 25702821, LumiSection 66 at 01-Jul-2021 07:50:01.480 GMT
Begin processing the 10th record. Run 195013, Event 25961949, LumiSection 66 at 01-Jul-2021 07:50:01.481 GMT
01-Jul-2021 07:50:01 GMT  Closed file root://eospublic.cern.ch//eos/opendata/cms/Run2012B/DoubleMuParked/AOD/22Jan2013-v1/10000/1EC938EF-ABEC-E211-94E0-90E6BA442F24.root

=============================================

MessageLogger Summary

 type     category        sev    module        subroutine        count    total
 ---- -------------------- -- ---------------- ----------------  -----    -----
    1 fileAction           -s file_close                             1        1
    2 fileAction           -s file_open                              2        2

 type    category    Examples: run/evt        run/evt          run/evt
 ---- -------------------- ---------------- ---------------- ----------------
    1 fileAction           PostEndRun                        
    2 fileAction           pre-events       pre-events       

Severity    # Occurrences   Total Occurrences
--------    -------------   -----------------
System                  3                   3

Key Points

  • The CMS VM contains all the required ingredients to start analyzing CMS open data.

  • In order to test and validate the virtual machine you can run a simple CMSSW job.