NanoAOD analysis: Introduction
Overview
Teaching: 10 min
Exercises: 10 minQuestions
What am I supposed to learn from this analysis?
What is the physics behind the data?
Objectives
Learn the basics of the physics processes present in the data
Learn about the content of the (reduced) NanoAOD files
The following sections show you how an analysis with CMS NanoAOD files and RDataFrame can be performed, from the inital files to the result plots.
Signal process
The physical process of interest, also often called signal, is the production of the Higgs boson in the decay to two tau leptons. The main production modes of the Higgs boson are the gluon fusion and the vector boson fusion production indicated in the plots with the labels gg→H and qq→H, respectively. See below the two Feynman diagrams that describe the processes at leading order. Note that this is the signal we will be exploring during the workshop. We will expand on this analysis.
Tau decay modes
The tau lepton has a very short lifetime of about 290 femtoseconds after which it decays into other particles. With a probability of about 20% each, the tau lepton decays into a muon or an electron and two neutrinos. All other decay modes consist of a combination of hadrons such as pions and kaons and a single neutrino. You can find here a full overview and the exact numbers. This analysis considers only tau lepton pairs of which one tau lepton decays into a muon and two neutrinos and the other tau lepton hadronically, whereas the official CMS analysis considered additional decay channels.
Background processes
Besides the Higgs boson decaying into two tau leptons, many other processes can produce very similar signatures in the detector, which have to be taken into account to draw any conclusions from the data. In the following, the most prominent processes with a similar signature as the signal are presented. Besides the QCD multijet process, the analysis estimates the contribution of the background processes using simulated events.
Z→ττ
The most prominent background process is the Z boson decaying into two tau leptons. The leading order production is called the Drell-Yan process in which a quark anti-quark pair annihilates. Because the Z boson decays directly into two tau leptons, same as the Higgs boson, this process is very hard to distinguish from the signal.
Z→ll
Besides the decay of the Z boson into two tau leptons, the Z boson decays with the same probability to electrons and muons. Although this process does not contain any genuine tau leptons, a tau can be reconstructed by mistake. Objects that are likely to be misidentified as a hadronic decay of a tau lepton are electrons or jets.
W+jets
W bosons are frequently produced at the LHC and can decay into any lepton. If a muon from a W boson is selected together with a misidentified tau from a jet, a similar event signature as the signal can occur. However, this process can be strongly suppressed by a cut in the event selection on the transverse mass of the muon and the missing energy, as done in the published CMS analysis.
tt¯
Top anti-top pairs are produced at the LHC by quark anti-quark annihilation or gluon fusion. Because a top quark decays immediately and almost exclusively via a W boson and a bottom quark, additional misidentifications result in signal-like signatures in the detector similar to the $W+\mathrm{jets}$ process explained above. However, the identification of jets originating from bottom quarks, and the subsequent removal of such events, is capable of reducing this background effectively.
QCD
The QCD multijet background describes decays with a large number of jets, which occurs very often at the LHC. Such events can be falsely selected for the analysis due to misidentifications. Because a proper simulation of this process is complex and computational expensive, the contribution is not estimated from simulation but from data itself. Therefore, we select tau pairs with the same selection as the signal, but with the modified requirement that both tau leptons have the same charge. Then, all known processes from simulation are subtracted from the histogram. Using the resulting histogram as estimation for the QCD multijet process is possible because the production of misidentified tau lepton candidates is independent of the charge.
Files and dataset content
The used files and the content of the datasets, for example the simulated Standard Model Higgs boson produced by Gluon fusion, can be found on the CERN Open Data portal. During the workshop, we will learn more about all this.
Have a look at the content of the (reduced) CMS NanoAOD files!
You can just look at the content on the CERN Open Data portal (follow for example this link) or take one of the files you will download below and investigate the content with ROOT, such as shown in the previous sections!
Why NanoAOD?
The NanoAOD format is a small version of the MiniAOD format (which is a small version of the AOD format) with a size of about 1 kB/Event. For the moment, all CMS open data are in the AOD format, but the Run2 data, once released, will be made available in the MiniAOD and NanoAOD formats. For this tutorial, we will use specially prepared files derived from Run1 AOD mimicking NanoAOD format.
Why reduced NanoAOD?
Note that the used NanoAOD files are reduced versions recreated with open CMS data and simulation from 2012. The NanoAOD format for Run2 data will be different and contain more information.
Download the required datasets
Because very likely you will run the code multiple times, we want to speed up the analysis so that you can focus on the software. To do so, download with xrdcp
the files on your computer or any other system with ROOT (v6.16 or later) available. The size of downloaded files sum up to about 6.5 GB and represent only 10% of the original files you can find on the Open Data portal, which enables you to run the full analysis in under five minutes.
Alternatively, you can download the files manually via HTTP from https://root.cern/files/HiggsTauTauReduced/.
SAMPLES=(
GluGluToHToTauTau
VBF_HToTauTau
DYJetsToLL
TTbar
W1JetsToLNu
W2JetsToLNu
W3JetsToLNu
Run2012B_TauPlusX
Run2012C_TauPlusX
)
for SAMPLE in ${SAMPLES[@]}
do
# Via XRootD:
xrdcp root://eospublic.cern.ch//eos/root-eos/HiggsTauTauReduced/${SAMPLE}.root .
# Via HTTP:
# curl -O https://root.cern/files/HiggsTauTauReduced/${SAMPLE}.root
done
Download the files!
Choose one of the options shown above and download the files!
Key Points
Analysis studies Higgs boson decays to two tau leptons with a muon and a hadronic tau in the final state
The input files are (reduced) CMS NanoAOD, being very close to actual analysis in CMS
The following steps will show in a hands-on the use of RDataFrame in an actual analysis