Analyzing Run 1 data
Overview
Teaching: 10 min
Exercises: 50 minQuestions
How can I analyze larger ROOT files?
Objectives
Demonstrate examples of accessing ROOT files remotely
To compare and contrast using standard ROOT approaches and newer ROOT objects like RDataFrame
Analyzying the dimuon samples
This lesson will be primarily following the material found here about using the NanoAOD for Run 1 format in an analysis of the dimuon samples.
Potential pitfalls!
We’ll be running over some larger ROOT files in this lesson and for some of you, memory issues may cause some errors or crashes of the code. If that happens, it is primarily restricted to this exercise and you should feel free to simply follow along with the instructor.
Download the code and scripts
Launch your Docker container, as per the previous episode.
From inside your Docker container, we’re going to execute a series of curl
commands.
curl
is a widely used utility to download files from remote locations.
Simply highlight the commands below and cut-and-paste them into your Docker terminal, or your
local terminal if you are working without Docker.
curl https://twiki.cern.ch/twiki/pub/CMSPublic/NanoAODRun1Examples/Dimuon2011_eospublic.C --output Dimuon2011_eospublic.C
curl https://twiki.cern.ch/twiki/pub/CMSPublic/NanoAODRun1Examples/Dimuon2011_eospublic_RDF.C --output Dimuon2011_eospublic_RDF.C
curl https://twiki.cern.ch/twiki/pub/CMSPublic/NanoAODRun1Examples/Dimuon2011_local.C --output Dimuon2011_local.C
curl https://twiki.cern.ch/twiki/pub/CMSPublic/NanoAODRun1Examples/Dimuon2011_local_RDF.C --output Dimuon2011_local_RDF.C
curl https://twiki.cern.ch/twiki/pub/CMSPublic/NanoAODRun1Examples/MuHistos_eospublic.cxx --output MuHistos_eospublic.cxx
curl https://twiki.cern.ch/twiki/pub/CMSPublic/NanoAODRun1Examples/MuHistos_local.cxx --output MuHistos_local.cxx
curl https://twiki.cern.ch/twiki/pub/CMSPublic/NanoAODRun1Examples/dimuonSpectrum2012_eospublic.C --output dimuonSpectrum2012_eospublic.C
curl https://twiki.cern.ch/twiki/pub/CMSPublic/NanoAODRun1Examples/dimuonSpectrum2012_eospublic_test.C --output dimuonSpectrum2012_eospublic_test.C
curl https://twiki.cern.ch/twiki/pub/CMSPublic/NanoAODRun1Examples/dimuonSpectrum2012_eospublic.py.txt --output dimuonSpectrum2012_eospublic.py
curl https://twiki.cern.ch/twiki/pub/CMSPublic/NanoAODRun1Examples/dimuonSpectrum2012_local.C --output dimuonSpectrum2012_local.C
curl https://twiki.cern.ch/twiki/pub/CMSPublic/NanoAODRun1Examples/dimuonSpectrum2012_local.py.txt --output dimuonSpectrum2012_local.py
curl https://twiki.cern.ch/twiki/pub/CMSPublic/NanoAODRun1Examples/dimuonSpectrum2012_outreach.C --output dimuonSpectrum2012_outreach.C
curl https://twiki.cern.ch/twiki/pub/CMSPublic/NanoAODRun1Examples/dimuonSpectrum2012_outreach.py.txt --output dimuonSpectrum2012_outreach.py
You can look to see that the files were downloaded properly by typing
ls -ltr
Run some of the commands
Your instructor will be executing most of these commands, some of which might take a few minutes to run.
In most cases, these scripts produce an output .pdf
file with a name similar to that of the script itself.
If you have any issues with ROOT displaying the plots as they are made, you can view the .pdf
files
on your local system by looking in the cms_open_data_run1
directory that you created.
Zeroth example
The first thing we will do is run a very simple ROOT script that loads in a single, relatively small file from your laptop and produces a plot of the dimuon spectrum. Because it is a single file, we will not have the same amount of data in this plot as later examples, but you can use this to check your connection and X windows forwarding (e.g. does the plot pop up?).
First, let’s download the file, using one of the XRootD utilities,
xrdcp
that allows us to download files.
xrdcp root://eospublic.cern.ch//eos/opendata/cms/upload/NanoAODRun1/01-Jul-22/Run2012B_DoubleMuParked/01-Jul-22Run2012B_DoubleMuParked/03C5684F-8BAF-4312-8235-2B0039F2FB93.root .
The file is about 1.4 Gb, but should take less than a minute to download.
xrdcp
alternativeIf
xrdcp
is not running/working, you can also download this file with this commandcurl https://opendata.cern.ch/record/6004/files/assets/cms/upload/NanoAODRun1/01-Jul-22/Run2012B_DoubleMuParked/01-Jul-22Run2012B_DoubleMuParked/03C5684F-8BAF-4312-8235-2B0039F2FB93.root --output 03C5684F-8BAF-4312-8235-2B0039F2FB93.root
Once it it is downloaded, you can process this one file by running dimuonSpectrum2012_eospublic_test.C
in ROOT.
To do so, you will launch ROOT with the name of the script as an argument.
root -l dimuonSpectrum2012_eospublic_test.C
It should take less than one minute to run and if it does, you will see a window pop up that looks like the following image.
CMS dimuon spectrum - 2012 data sample
Invariant mass of a select sample of oppositely charged dimuon pairs. Derived from a smaller subset of 2012 data.
If you are having issues with X11 forwarding, the script should still create a file
dimuonSpectrum2012_C_eospublic.pdf
in the cms_open_data_run1
directory you made,
and you can view it there.
In the following sections, the scripts are written so as to run over larger files that you access remotely. Depending on your connection, it may take longer than the time alotted for this activity during the workshop, in which case you are encouraged to follow along with the instructor and run these on your own time, if you so choose.
First example
Let’s run the first command making use of ROOT’s ability to compile and execute a file in one step. This might take 5 minutes for local participants at CERN, but longer for remote participants.
root -l MuHistos_eospublic.cxx++
root [0]
Processing MuHistos_eospublic.cxx++...
Info in <TUnixSystem::ACLiC>: creating shared library /code/./MuHistos_eospublic_cxx.so
reading root://eospublic.cern.ch//eos/opendata/cms/upload/NanoAODRun1/01-Jul-22/Run2010B_Mu_merged.root
writing to MuHistos_Mu_eospublic.root
entries = 26718043
event nr 0
event nr 1000000
event nr 2000000
event nr 3000000
event nr 4000000
event nr 5000000
event nr 6000000
event nr 7000000
event nr 8000000
event nr 9000000
event nr 10000000
event nr 11000000
event nr 12000000
event nr 13000000
event nr 14000000
event nr 15000000
event nr 16000000
event nr 17000000
event nr 18000000
event nr 19000000
event nr 20000000
event nr 21000000
event nr 22000000
event nr 23000000
event nr 24000000
event nr 25000000
event nr 26000000
After the above output, the program will finish and will return the command-line prompt.
The program should have produced an output file called MuHistos_Mu_eospublic.root
. You can check this by typing
ls -l MuHistos_Mu_eospublic.root
-rw-r--r-- 1 cmsusr cmsusr 31625 Jul 31 22:57 MuHistos_Mu_eospublic.root
You can open this file in ROOT and inspect it with a TBrowser
. First type
root -l MuHistos_Mu_eospublic.root
This will put you into the ROOT environment, from which you can then launch the TBrowser
from the prompt (you needn’t type the root [0]
).
root [0] TBrowser b;
You can then click on the file name in the window and then click on the various histograms to view them.
Example 2 (RDataFrame
)
A different example runs over a smaller (2 Gb) file, primarily used for
outreach, and makes use of ROOT’s relatively
newer RDataFrame
object. You can run this example by launching ROOT from the commandline.
root -l dimuonSpectrum2012_outreach.C
It should only take a few minutes to run and if X-forwarding is working for you, you should see a ROOT window pop up that looks like this.
CMS dimuon spectrum
Invariant mass of a select sample of oppositely charged dimuon pairs.
Example 3 (2011 data)
This example runs over 2011 data that was used for more “real” analysis. Takes about 15 minutes at CERN.
root -l Dimuon2011_eospublic_RDF.C
When it finishes, it should pop up a window with the following plot.
CMS dimuon spectrum - 2011 data sample
Invariant mass of a select sample of oppositely charged dimuon pairs. Derived from 2011 data.
Key Points
Making use of RDataFrame can speed up your analysis
You have different options to call ROOT
You can access files remotely or download them for local access