What is in the datafiles?
Overview
Teaching: 5 min
Exercises: 5 minQuestions
How do I inspect these files to see what is in them?
Objectives
To be able to see what objects are in the data files
To be able to see how big these files are and how much space these object take up.
This part of the lesson will be done from within either a Docker container or VM. All commands will be typed inside that environment.
Setting up your CMSSW area
If you completed the lessons on virtual machines or Docker you should already have a working CMSSW area.
-
If you are using the VM:
-
turn on your virtual machine and go to the right shell according to the validation instructions:
-
-
If you are using Docker:
- Start the container with:
docker start -i <theNameOfyourContainer>
- Start the container with:
Make sure you change directories to the CMSSW_5_3_32/src
area; for instance, in Docker:
cd /home/cmsusr/CMSSW_5_3_32/src
Note that we are not really “installing” CMSSW but setting up an environment for it. CMSSW was already installed. This is why every time you open a new shell you will have to issue the cmsenv
command, which is just a script that runs to set some environmental variables for your working area:
cmsenv
edmXXX tools
CMS uses a set of homegrown tools to interact with the AOD format, all of which are prefixed by edm, which stands for Event Data Model. We will not show you all of them, but introduce a few to give you an idea of what can be done.
edmDumpEventContent
The edmXXX tools take as an argument the full path to a file. Following a similar approach to the previous module, we’ve chosen one of the Monte Carlo files to test, but these commands would equally well with a data file.
Let’s start by using edmDumpEventContent
and looking at the options
edmDumpEventContent --help
Usage: edmDumpEventContent [options] templates.root
Prints out info on edm file.
Options:
-h, --help show this help message and exit
--name print out only branch names
--all Print out everything: type, module, label, process, and
branch name
--lfn Force LFN2PFN translation (usually not necessary)
--lumi Look at 'lumi' tree
--run Look at 'run' tree
--regex=REGEX Filter results based on regex
--skipping Print out branches being skipped
--forceColumns Forces printouts to be in nice columns
We will first use edmDumpEventContent
to see what is in one of these files with no other options. It may take 15-60 seconds to run and
there will be a lot of output. You may find it useful to redirect the output to a file and then look at it there
using less
or a similar command (you can exit less
by typing q
).
edmDumpEventContent root://eospublic.cern.ch//eos/opendata/cms/MonteCarlo2012/Summer12_DR53X/TTJets_SemiLeptMGDecays_8TeV-madgraph/AODSIM/PU_RD1_START53_V7N-v1/10000/EA978C41-27D1-E211-9424-003048D46016.root > test_edm_output.log
less test_edm_output.log
Type Module Label Process
----------------------------------------------------------------------------------------------
LHEEventProduct "source" "" "LHE"
GenEventInfoProduct "generator" "" "SIM"
edm::TriggerResults "TriggerResults" "" "SIM"
vector<int> "genParticles" "" "SIM"
vector<reco::GenJet> "ak5GenJets" "" "SIM"
vector<reco::GenJet> "ak7GenJets" "" "SIM"
vector<reco::GenJet> "kt4GenJets" "" "SIM"
vector<reco::GenJet> "kt6GenJets" "" "SIM"
vector<reco::GenMET> "genMetCalo" "" "SIM"
vector<reco::GenMET> "genMetCaloAndNonPrompt" "" "SIM"
.
.
.
vector<reco::TrackExtra> "tevMuons" "firstHit" "RECO"
vector<reco::TrackExtra> "tevMuons" "picky" "RECO"
vector<reco::TrackExtrapolation> "trackExtrapolator" "" "RECO"
vector<reco::TrackJet> "ak5TrackJets" "" "RECO"
vector<reco::Vertex> "offlinePrimaryVertices" "" "RECO"
vector<reco::Vertex> "offlinePrimaryVerticesWithBS" "" "RECO"
vector<reco::VertexCompositeCandidate> "generalV0Candidates" "Kshort" "RECO"
vector<reco::VertexCompositeCandidate> "generalV0Candidates" "Lambda" "RECO"
You can get from this information the names of physics objects you may be interested in (e.g. ak5TrackJets
)
as well as what stage of processing they were produced at (SIM is for simulations and RECO is for reconstruction).
This information can be useful when writing your analysis code, which will be discussed in a later lesson.
Some of the other command-line options can be useful as well to filter the information.
Challenge!
Try the following options (with the same file) and see what it gives you. Can you see why this might be useful?
edmDumpEventContent --regex=Muon root://eospublic.cern.ch//eos/opendata/cms/MonteCarlo2012/Summer12_DR53X/TTJets_SemiLeptMGDecays_8TeV-madgraph/AODSIM/PU_RD1_START53_V7N-v1/10000/EA978C41-27D1-E211-9424-003048D46016.root edmDumpEventContent --name root://eospublic.cern.ch//eos/opendata/cms/MonteCarlo2012/Summer12_DR53X/TTJets_SemiLeptMGDecays_8TeV-madgraph/AODSIM/PU_RD1_START53_V7N-v1/10000/EA978C41-27D1-E211-9424-003048D46016.root
Key Points
It’s useful to sometimes inspect the files before diving into the full analysis
Some files may not have the information you’re looking for