This lesson is still being designed and assembled (Pre-Alpha version)

Jets and missing transverse momentum

CMS Jets and MET

Overview

Teaching: 10 min
Exercises: 15 min
Questions
  • How are jets and missing transverse energy accessed in CMS files?

Objectives
  • Identify jet and MET code collections in AOD files

  • Understand typical features of jet/MET objects

  • Practice accessing jet quantities

After tracks and energy deposits in the CMS tracking detectors (inner, muon) and calorimeters (electromagnetic, hadronic) are reconstructed as particle flow candidates, an event can be interpreted in various ways. Two common elements of event interpretation are clustering jets and calculating missing transverse momentum.

Jets

Jets are spatially-grouped collections of long-lived particles that are produced when a quark or gluon hadronizes. The kinetmatic properties of jets resemble that of the initial partons that produced them. In the CMS language, jets are made up of many particles, with the following predictable energy composition:

Jets are very messy! Hadronization and the subsequent decays of unstable hadrons can produce 100s of particles near each other in the CMS detector. Hence these particles are rarely analyzed individually. How can we determine which particle candidates should be included in each jet?

Clustering

Jets can be clustered using a variety of different inputs from the CMS detector. “CaloJets” use only calorimeter energy deposits. “GenJets” use generated particles from a simulation. But by far the most common are “PFJets”, from particle flow candidates.

The result of the CMS Particle Flow algorithm is a list of particle candidates that account for all inner-tracker and muon tracks and all above-threshold energy deposits in the calorimeters. These particles are formed into jets using a “clustering algorithm”. The most common algorithm used by CMS is the “anti-kt” algorithm, which is abbreviated “AK”. It iterates over particle pairs and finds the two (i and j) that are the closest in some distance measure and determines whether to combine them:

The momentum power (-2) used by the anti-kt algorithm means that higher-momentum particles are clustered first. This leads to jets with a round shape that tend to be centered on the hardest particle. In CMS software this clustering is implemented using the [[fastjet][www.fastjet.fr]] package.

Pileup

Inevitably, the list of particle flow candidates contains particles that did not originate from the primary interaction point. CMS experiences multiple simultaneous collisions, called “pileup”, during each “bunch crossing” of the LHC, so particles from multiple collisions coexist in the detector. There are various methods to remove their contributions from jets:

Accessing jets in CMS software

Jets software classes have the same basic 4-vector methods as the objects discussed in the previous lesson:

Handle<CaloJetCollection> jets;
iEvent.getByLabel(InputTag("ak5PFJets"), jets);

for (auto it = jets->begin(); it != jets->end(); it++) {

    value_jet_pt[value_jet_n] = it->pt();
    value_jet_eta[value_jet_n] = it->eta();
    value_jet_phi[value_jet_n] = it->phi();
    value_jet_mass[value_jet_n] = it->mass();

}

Particle-flow jets are not immune to noise in the detector, and jets used in analyses should be filtered to remove noise jets. CMS has defined a Jet ID with criteria for good jets:

The PFlow jets are required to have charged hadron fraction CHF > 0.0 if within tracking fiducial region of |eta| < 2.4, neutral hadron fraction NHF < 1.0, charged electromagnetic (electron) fraction CEF < 1.0, and neutral electromagnetic (photon) fraction NEF < 1.0. These requirements remove fake jets arising from spurious energy depositions in a single sub-detector.

These criteria demonstrate how particle-flow jets combine information across subdetectors. Jets will typically have energy from electrons and photons, but those fractions of the total energy should be less than one. Similarly, jets should have some energy from charged hadrons if they overlap the inner tracker, and all the energy should not come from neutral hadrons. A mixture of energy sources is expected for genuine jets. All of these energy fractions (and more) can be accessed from the jet objects.

Challenge: Jet ID

Use the cms-sw github repository to learn the methods available for PFJets (hint: the header file is included from AOD2NanoAOD.cc). Implement the jet ID and reject jets that do not pass. Rejection means that information about these jets will not be stored in any of the tree branches.

Solution

The header file we need is for particle-flow jets: interface/PFJet.h from the link given. It shows many functions like this:

float chargedHadronEnergyFraction () const {return chargedHadronEnergy () / energy ();}

These functions give the energy from a certain type of particle flow candidate as a fraction of the jet’s total energy. We can apply the conditions given to reject jets from noise at the same time we apply a momentum threshold:

for (auto it = jets->begin(); it != jets->end(); it++) {
 if (it->pt > jet_min_pt && it->chargedHadronEnergyFraction() > 0 && it->neutralHadronEnergyFraction() < 1.0 &&
     it->electronEnergyFraction() < 1.0 && it->photonEnergyFraction() < 1.0){

   // calculate things on jets
 }
}

MET

Missing transverse momentum is the negative vector sum of the transverse momenta of all particle flow candidates in an event. The magnitude of the missing transverse momentum vector is called missing transverse energy and referred to with the acronym “MET”. Since energy corrections are made to the particle flow jets, those corrections are propagated to MET by adding back the momentum vectors of the original jets and then subtracting the momentum vectors of the corrected jets. This correction is called “Type 1” and is standard for all CMS analyses. The jet energy corrections will be discussed more deeply at the end of this lesson.

In AOD2NanoAOD.cc we open the particle flow MET module and extract the magnitude and angle of the MET, the sum of all energy in the detector, and variables related to the “significance” of the MET. Note that MET quantities have a single value for the entire event, unlike the objects studied previously.

Handle<PFMETCollection> met;
iEvent.getByLabel(InputTag("pfMet"), met);

value_met_pt = met->begin()->pt();
value_met_phi = met->begin()->phi();
value_met_sumet = met->begin()->sumEt();

value_met_significance = met->begin()->significance();
auto cov = met->begin()->getSignificanceMatrix();
value_met_covxx = cov[0][0];
value_met_covxy = cov[0][1];
value_met_covyy = cov[1][1];

MET significance can be a useful tool: it describes the likelihood that the MET arose from noise or mismeasurement in the detector as opposed to a neutrino or similar non-interacting particle. The four-vectors of the other physics objects along with their uncertainties are required to compute the significance of the MET signature. MET that is directed nearly (anti)colinnear with a physics object is likely to arise from mismeasurement and should not have a large significance.

Challenge: real and fake MET

Compile all your changes to AOD2NanoAOD.cc so far and run over the simulation sample again. This test file contains top quark pair events, so some events will have leptonic decays that include neutrinos and some events will not. Review TTree::Draw from the pre-exercises – can you draw histograms of MET versus MET significance and infer which events have leptonic decays?

$ scram b
$ cmsRun configs/simulation_cfg.py
$ # edit simulation_cfg.py to use the Drell-Yan test file, and save output_DY.root
$ cmsRun confings/simulation_cfg.py
$ root -l output.root
[0] TTree *ttbar = (TTree*)_file0->Get("aod2nanoaod/Events");
[1] TFile *_file1 = TFile::Open("output_DY.root");
[2] TTree *dy = (TTree*)_file1->Get("aod2nanoaod/Events");
[3] ttbar->Draw("...things...","...any cuts...","norm)
[4] dy->Draw("...things...","...any cuts...","norm pe same")

Solution

The difference between the Drell-Yan events with primarily fake MET and the top pair events with primarily genuine MET can be seen by drawing MET_pt or by drawing MET_significance. In both distributions the Drell-Yan events have smaller values than the top pair events.

width=0.5

Key Points

  • Jets are spatially-grouped collections of particles that traversed the CMS detector

  • Particles from additional proton-proton collisions (pileup) must be removed from jets

  • Missing transverse energy is the negative vector sum of particle candidates

  • Many of the class methods discussed for other objects can be used for jets


Heavy flavor tagging

Overview

Teaching: 10 min min
Exercises: 20 min min
Questions
  • How are b hadrons identified in CMS

Objectives
  • Understand the basics of heavy flavor tagging

  • Learn to access tagging information in AOD files

Jet reconstruction and identification is an important part of the analyses at the LHC. A jet may contain the hadronization products of any quark or gluon, or possibly the decay products of more massive particles such as W or Higgs bosons. Several “b tagging” algorithms exist to identify jets from the hadronization of b quarks, which have unique properties that distinguish them from light quark or gluon jets.

B Tagging Algorithms

Tagging algorithms first connect the jets with good quality tracks that are either associated with one of the jet’s particle flow candidates or within a nearby cone. Both tracks and “secondary vertices” (track vertices from the decays of b hadrons) can be used in track-based, vertex-based, or “combined” tagging algorithms. The specific details depend upon the algorithm use. However, they all exploit properties of b hadrons such as:

Tagging algorithms are Algorithms that are used for b-tagging:

These algorithms produce a single, real number (often the output of an MVA) called a b tagging “discriminator” for each jet. The more positive the discriminator value, the more likely it is that this jet contained b hadrons.

Accessing tagging information

In AOD2NanoAOD.cc we access the information from the Combined Secondary Vertex (CSV) b tagging algorithm and associate discriminator values with the jets. The CSV values are stored in a separate collection in the AOD files called a JetTagCollection, which is effectively a vector of associations between jet references and float values (such as a b-tagging discriminator).

#include "DataFormats/JetReco/interface/PFJet.h"
#include "DataFormats/BTauReco/interface/JetTag.h"

Handle<PFJetCollection> jets;
iEvent.getByLabel(InputTag("ak5PFJets"), jets);

//define b-tag discriminators handle and get the discriminators
Handle<JetTagCollection> btags;
iEvent.getByLabel(InputTag("combinedSecondaryVertexBJetTags"), btags);

//loop over jets
const float jet_min_pt = 15;
value_jet_n = 0;
std::vector<PFJet> selectedJets;
for (auto it = jets->begin(); it != jets->end(); it++) {
  if (it->pt() > jet_min_pt) {

    // from the btag collection get the float (second) from the association to this jet.
    value_jet_btag[value_jet_n] = btags->operator[](it - jets->begin()).second;

  }
}

We can investigate at the b tag information in a dataset using ROOT.

$ root -l output.root
[0] _file0->cd("aod2nanoaod");
[1] Events->Draw("Jet_btag");

From the “Events” tree, select Jet_btag to see the distribution of discriminator values in the top quark pair file you processed earlier.

Challenge: alternate taggers

Use edmDumpEventContent to investiate other b tagging algorithms available as edm::AssociationVector types. Add 1-2 new branches for alternate taggers and compare those discriminants to CSV (compile and run cmsRun as you’ve done before).

Solution

Let’s add the MVA version of CSV and the high purity track counting tagger, which was the most common tagger in 2011. After adding new array declarations and branches in the top sections of AOD2NanoAOD.cc, we can open the collections for these alternate taggers:

Handle<JetTagCollection> btagsMVA, btagsTC;
iEvent.getByLabel(InputTag("trackCountingHighPurBJetTags"), btagsTC);
iEvent.getByLabel(InputTag("combinedSecondaryVertexMVABJetTags"), btagsMVA);

// inside the jet loop
 value_jet_btagmva[value_jet_n] = btagsMVA->operator[](it - jets->begin()).second;
 value_jet_btagtc[value_jet_n] = btagsTC->operator[](it - jets->begin()).second;

The distributions in ttbar events (excluding events with values of -9 where the tagger wasn’t evaluated) are shown below. The track counting discriminant is quite different and ranges 0 – 30 or so.

Working points

A jet is considered “b tagged” if the discriminator value exceeds some threshold. Different thresholds will have different efficiencies for identifying true b quark jets and for mis-tagging light quark jets. As we saw for muons and other objects, a “loose” working point will allow the highest mis-tagging rate, while a “tight” working point will sacrifice some correct-tag efficiency to reduce mis-tagging. The CSV algorithm has working points defined based on mis-tagging rate:

Challenge: count medium CSV b tags

Calculate the number of jets per event that are b tagged according to the medium working point of the CSV algorithm.

Solution

We count the number of “Medium CSV” b-tagged jets by summing up the number of jets with discriminant values greater than 0.679. After adding a variable declaration and branch we can sum up the counter:

value_jet_nCSVM = 0;
for (auto it = jets->begin(); it != jets->end(); it++){

 // skipping bits

 value_jet_btag[value_jet_n] = btags->operator[](it - jets->begin()).second
 if (value_jet_btag[value_jet_n] > 0.679) value_jet_nCSVM++;
 value_jet_n++;
}

The distribution of the number of CSV b jets also shows a strong difference, as expected, between Drell-Yan and top pair events:

Data and simulation differences:

When training a tagging algorithm, it’s highly probable that the efficiencies for tagging different quark flavors as b jets will vary between simulation and data. These differences must be measured and corrected for using “scale factors” constructed from ratios of the efficiencies from different sources. The figures below show examples of the b and light quark efficiencies and scale factors as a function of jet momentum (read more).

In simulation, the efficiency for tagging b quarks as b jets is defined as the number of “real b jets” (jets spatially matched to generator-level b hadrons) tagged as b jets divided by the number of real b jets. The efficiency for mis-tagging c or light quarks as b jets is similar (real c/light jets tagged as b jets / real c/light jets). These values are typically computed as functions of the momentum or pseudorapidity of the jet. The “real” flavor of the jet is accessed most simply by creating pat::Jet objects instead of reco::Jet objects, as we will see in the next episode.

Scale factors to increase or decrease the number of b-tagged jets in simulation can be applied in a number of ways, but typically involve weighting simulation events based on the efficiencies and scale factors relevant to each jet in the event. Scale factors for the CSV algorithm are available for Open Data and involve extracting functions from a comma-separated-values file. Details and usage reference can be found here:

Watch for more efficiency and scale factor implementation examples in future updates to the Open Data Guide!

Key Points

  • Tagging algorithms separate heavy flavor jets from jets produced by the hadronization of light quarks and gluons

  • Tagging algorithms produce a disriminator value for each jet that represents the likelihood that the jet came from a b hadron

  • Each tagging algorithm has recommended ‘working points’ (discriminator values) based on a misidentification probability for light-flavor jets


Jet corrections

Overview

Teaching: 15 min
Exercises: 20 min
Questions
  • How are data/simulation differences dealt with for jets and MET?

Objectives
  • Learn about typical differences in jet energy scale and resolution between data and simulation

  • Understand how these corrections are applied to jets and MET

  • Access the uncertainties in the jet energy correction

  • Practice saving jet observables with uncertainties

Unsurprisingly, the CMS detector does not measure jet energies perfectly, nor do simulation and data agree perfectly! The measured energy of jet must be corrected so that it can be related to the true energy of its parent particle. These corrections account for several effects and are factorized so that each effect can be studied independently.

Correction levels

Particles from additional interactions in nearby bunch crossings of the LHC contribute energy in the calorimeters that must somehow be distinguished from the energy deposits of the main interaction. Extra energy in a jet’s cone can make its measured momentum larger than the momentum of the parent particle. The first layer (“L1”) of jet energy corrections accounts for pileup by subtracting the average transverse momentum contribution of the pileup interactions to the jet’s cone area. This average pileup contribution varies by pseudorapidity and, of course, by the number of interactions in the event.

The second and third layers of corrections (“L2L3”) correct the measured momentum to the true momentum as functions of momentum and pseudorapidity, bringing the reconstructed jet in line with the generated jet. These corrections are derived using momentum balancing and missing energy techniques in dijet and Z boson events. One well-measured object (ex: a jet near the center of the detector, a Z boson reconstructed from leptons) is balanced against a jet for which corrections are derived.

All of these corrections are applied to both data and simulation. Data events are then given “residual” corrections to bring data into line with the corrected simulation. A final set of flavor-based corrections are used in certain analyses that are especially sensitive to flavor effects. All of the corrections are described in this paper. The figure below shows the result of the L1+L2+L3 corrections on the jet response.

JEC from text files

There are several methods available for applying jet energy corrections to reconstructed jets. We have demonstrated a method to read in the corrections from text files and extract the corrections manually for each jet. The text files can be extracted from the global tag. First, set up sym links to the conditions databases for 2012 data and simulation (reference instructions):

You might have done this in the pre-exercises! But if not, do it now. You will also need CVMFS mounted in your VM or docker container, which is covered in the pre-exercises.

$ ln -sf /cvmfs/cms-opendata-conddb.cern.ch/FT53_V21A_AN6_FULL FT53_V21A_AN6
$ ln -sf /cvmfs/cms-opendata-conddb.cern.ch/FT53_V21A_AN6_FULL.db FT53_V21A_AN6_FULL.db
$ ln -sf /cvmfs/cms-opendata-conddb.cern.ch/FT53_V21A_AN6_FULL FT53_V21A_AN6_FULL
$ ln -sf /cvmfs/cms-opendata-conddb.cern.ch/START53_V27 START53_V27
$ ln -sf /cvmfs/cms-opendata-conddb.cern.ch/START53_V27.db START53_V27.db
$ ls -l   ## make sure you see the full links as written above

To write out text files, we will use configs/jec_cfg.py, which uses a small analyzer to open the database files we just linked:

# connect to global tag                                                                                                               
if isData:
    process.GlobalTag.connect = cms.string('sqlite_file:/cvmfs/cms-opendata-conddb.cern.ch/FT53_V21A_AN6_FULL.db')
    process.GlobalTag.globaltag = 'FT53_V21A_AN6::All'
else:
    process.GlobalTag.connect = cms.string('sqlite_file:/cvmfs/cms-opendata-conddb.cern.ch/START53_V27.db')
    process.GlobalTag.globaltag = 'START53_V27::All'


# setup JetCorrectorDBReader                                                                                                          
process.maxEvents = cms.untracked.PSet(input=cms.untracked.int32(1))
process.source = cms.Source('EmptySource')
process.ak5 = cms.EDAnalyzer('JetCorrectorDBReader',
                             payloadName=cms.untracked.string('AK5PF'),
                             printScreen=cms.untracked.bool(False),
                             createTextFile=cms.untracked.bool(True))

if isData:
    process.ak5.globalTag = cms.untracked.string('FT53_V21A_AN6')
else:
    process.ak5.globalTag = cms.untracked.string('START53_V27')

Make the text files

Run this job once with isData = True and once with isData = False (if you access the condition database for the first time, this will take a while). Then move the text files to the data/ directory:

$ cmsRun configs/jec_cfg.py
$ ## edit the file and flip isData
$ cmsRun configs/jec_cfg.py
$ mv *AK5PF.txt data/

In simulation_cfg.py the file names are passed to the AOD2NanoAOD analyzer:

process.aod2nanoaod = cms.EDAnalyzer("AOD2NanoAOD",
        jecL1Name = cms.FileInPath('workspace/AOD2NanoAODOutreachTool/data/START53_V27_L1FastJet_AK5PF.txt'),
        jecL2Name = cms.FileInPath('workspace/AOD2NanoAODOutreachTool/data/START53_V27_L2Relative_AK5PF.txt'),
        jecL3Name = cms.FileInPath('workspace/AOD2NanoAODOutreachTool/data/START53_V27_L3Absolute_AK5PF.txt'),
        jecUncName = cms.FileInPath('workspace/AOD2NanoAODOutreachTool/data/START53_V27_Uncertainty_AK5PF.txt'),
        isData = cms.bool(False)

In AOD2NanoAOD.cc the files are read to build a factorizedJetCorrector object from which the corrections can be accessed:

// Object definitions
bool isData;
std::vector<std::string> jecPayloadNames_;
std::string              jecL1_;
std::string              jecL2_;
std::string              jecL3_;
boost::shared_ptr<FactorizedJetCorrector> jec_;

// In the constructor the factorizedJetCorrected is set up
AOD2NanoAOD::AOD2NanoAOD(const edm::ParameterSet &iConfig){
  isData = iConfig.getParameter<bool>("isData");
  jecL1_ = iConfig.getParameter<edm::FileInPath>("jecL1Name").fullPath(); // JEC level payloads                     
  jecL2_ = iConfig.getParameter<edm::FileInPath>("jecL2Name").fullPath(); // JEC level payloads                     
  jecL3_ = iConfig.getParameter<edm::FileInPath>("jecL3Name").fullPath(); // JEC level payloads                     

  //Get the factorized jet corrector parameters.
  jecPayloadNames_.push_back(jecL1_);
  jecPayloadNames_.push_back(jecL2_);
  jecPayloadNames_.push_back(jecL3_);
    
  std::vector<JetCorrectorParameters> vPar;
  for ( std::vector<std::string>::const_iterator payloadBegin = jecPayloadNames_.begin(),
    payloadEnd = jecPayloadNames_.end(), ipayload = payloadBegin; ipayload != payloadEnd; ++ipayload ) {
    JetCorrectorParameters pars(*ipayload);
    vPar.push_back(pars);
    }

  // Make the FactorizedJetCorrector and Uncertainty                                                                                              
  jec_ = boost::shared_ptr<FactorizedJetCorrector> ( new FactorizedJetCorrector(vPar) );

  // ....function continues
}

In the analyze function the correction is evaluated for each jet. The correction depends on the momentum, pseudorapidity, energy, and cone area of the jet, as well as the value of “rho” (the average momentum per area) and number of interactions in the event. The correction is used to scale the momentum of the jet.

Handle<double> rhoHandle;
iEvent.getByLabel(InputTag("fixedGridRhoFastjetAll"), rhoHandle);

for (auto it = jets->begin(); it != jets->end(); it++) {
  if (it->pt() > jet_min_pt) {

    reco::Candidate::LorentzVector uncorrJet = it->p4();
    jec_->setJetEta( uncorrJet.eta() );
    jec_->setJetPt ( uncorrJet.pt() );
    jec_->setJetE  ( uncorrJet.energy() );
    jec_->setJetA  ( it->jetArea() );
    jec_->setRho   ( *(rhoHandle.product()) );
    jec_->setNPV   ( vertices->size() );

    double corr = jec_->getCorrection();

    value_jet_pt[value_jet_n] = it->pt();
    value_corr_jet_pt[value_jet_n] = corr * uncorrJet.pt();
   
  }
}

Challenge: add L2L3 residual corrections to data

In data, the L2L3 residual corrections should also be applied. Use the “isData” switch and set up AOD2NanoAOD.cc and data_cfg.py to fully correct jets in data.

Solution

When processing data, we need to open 4 text files rather than 3. This happens first in the config, which is actually missing ALL the text files right now! No switching is needed since we have a separate data config, but python if statements can be used if you want to have one configuration file for both data and simulation (left as an exercise to the reader). The text files for data start with “FT53” rather than “START53”:

process.aod2nanoaod = cms.EDAnalyzer("AOD2NanoAOD",
		      isData = cms.bool(True),
		      doPat = cms.bool(False),
		      jecL1Name = cms.FileInPath('workspace/AOD2NanoAODOutreachTool/data/FT53_V21A_AN6_L1FastJet_AK5PF.txt')
		      jecL2Name = cms.FileInPath('workspace/AOD2NanoAODOutreachTool/data/FT53_V21A_AN6_L2Relative_AK5PF.txt')
		      jecL3Name = cms.FileInPath('workspace/AOD2NanoAODOutreachTool/data/FT53_V21A_AN6_L3Absolute_AK5PF.txt')
		      jecResName = cms.FileInPath('workspace/AOD2NanoAODOutreachTool/data/FT53_V21A_AN6_L2L3Residual_AK5PF.txt')
		      jecUncName = cms.FileInPath('workspace/AOD2NanoAODOutreachTool/data/FT53_V21A_AN6_Uncertainty_AK5PF.txt')
		      )

In the source code we need to: teach the code about L2L3Residual, open that file only is isData, and apply uncertainties only if !isData. The first task is done in the class definition:

std::string jecL3_;
std::string jecRes_;

The second task is done in the constructor function of AOD2NanoAOD:

 jecL3_ = iConfig.getParameter<edm::FileInPath>("jecL3Name").fullPath();
 jecRes_ = iConfig.getParameter<edm::FileInPath>("jecResName").fullPath();

 jecPayloadNames_.push_back(jecL3_);
 if(isData) jecPayloadNames_.push_back(jecRes_);

And finally, we should escape the uncertainty calculation (more info on this below!) in the jet loop if we are working on data:

 double corrUp = 1.0;
 double corrDown = 1.0;
 if(!isData){
   jecUnc_->setJetEta( uncorrJet.eta() );
   // etc, through accessing corrUp and corrDown
 }

Jet Energy Resolution

These corrections account for differences between the true and measured energy scale of jets, but not the energy resolution. The jet momentum resolution is typically too small in simulation and is widened using a Gaussian smearing technique. Watch for implementation details on this correction in a future update to the Open Data Guide.

JEC while producing pat::Jets

Another popular object format in CMS is the “Physics Analysis Toolkit” format, called PAT. The jet energy corrections and Type-1 MET corrections can be applied to RECO jets while making PAT jets. To do this we will load the global tag and databases directly in the configuration file and use the ‘addJetCollection’ process to create a collection of pat::jets. Look at simulation_patjets_cfg.py:

# Set up the new jet collection                                                                             
process.ak5PFJets.doAreaFastjet = True
addPfMET(process, 'PF')

addJetCollection(process,cms.InputTag('ak5PFJets'),
                 'AK5', 'PFCorr',
                 doJTA        = True,
                 doBTagging   = True,
                 jetCorrLabel = ('AK5PF', cms.vstring(['L1FastJet','L2Relative','L3Absolute']))
                 doType1MET   = True,
                 doL1Cleaning = True,
                 doL1Counters = False,
                 doJetID      = True,
                 jetIdLabel   = "ak5",
                 )

In AOD2NanoAOD.cc we can look at the sections marked if(doPat) to see the difference in usage. In general, pat::jets are more complex to create in the configuration file, but simpler to use because of their additional functions. In particular, accessing the jet’s flavor directly makes calculation of b-tagging efficiencies and scale factors simpler.

  if(doPat){

    Handle<PFMETCollection> metT1;
    iEvent.getByLabel(InputTag("pfType1CorrectedMet"), metT1);
    value_met_type1_pt = metT1->begin()->pt();

    Handle<std::vector<pat::Jet> > patjets;
    iEvent.getByLabel(InputTag("selectedPatJetsAK5PFCorr"), patjets);

    value_patjet_n = 0;
    for (auto it = patjets->begin(); it != patjets->end(); it++) {
      if (it->pt() > jet_min_pt) {

        // Corrected values are now the default                                                             
        value_patjet_pt[value_patjet_n] = it->pt();
        value_patjet_eta[value_patjet_n] = it->eta();
        value_patjet_mass[value_patjet_n] = it->mass();

        // but uncorrected values can be accessed. JetID should be computed from the uncorrected jet        
        pat::Jet uncorrJet = it->correctedJet(0);
        value_uncorr_patjet_pt[value_patjet_n] = uncorrJet.pt();
        value_uncorr_patjet_eta[value_patjet_n] = uncorrJet.eta();
        value_uncorr_patjet_mass[value_patjet_n] = uncorrJet.mass();

        // b-tagging is built in. Can access the truth flavor needed for b-tag effs & scale factor application!                                                                                                        
        value_patjet_hflav[value_patjet_n] = it->hadronFlavour();
        value_patjet_btag[value_patjet_n] = it->bDiscriminator( "pfCombinedSecondaryVertexBJetTags");

        value_patjet_n++;
      }
    }
  }

Challenge: create PAT jets

Run simulation_patjets_cfg.py, open the file, and compare the two jet correction and b-tagging methods. Method 1 has Jet_ and CorrJet_ branches and Method 2 has PatJet_ and PatJet_uncorr branches.

Solution

We can clearly see the difference between corrected and uncorrected jet momentum, particularly at low momentum where the effects of pileup are largest as a fraction of the total jet energy. here are differences in the number of jets across the collections because of different applications of the pt threshold (applied to uncorrected PFJets and corrected PATJets). We can also see small differences between particle-flow MET and type-1 corrected MET. The b-tagging distribution is almost identical, with the majority of the differences between the jet collections lying in the “dummy” value columns.

Uncertainties

You will have noticed that nested among the jet energy correction code snippets give above were commands related to the uncertainty in this correction. The uncertainty is also read from a text file in this example, and is used to increase or decrease the correction to the jet momentum.

// Object definition
boost::shared_ptr<FactorizedJetCorrector> jec_;

// In the constructor the JetCorrectionUncertainty is set up
AOD2NanoAOD::AOD2NanoAOD(const edm::ParameterSet &iConfig){

  jecUncName_ = iConfig.getParameter<edm::FileInPath>("jecUncName").fullPath();      // JEC uncertainties                               
  jecUnc_ = boost::shared_ptr<JetCorrectionUncertainty>( new JetCorrectionUncertainty(jecUncName_) );

  // ....function continues
}

// In the analyze function the uncertainty is evaluated
for (auto it = jets->begin(); it != jets->end(); it++) {
  if (it->pt() > jet_min_pt) {

    double corr = jec_->getCorrection();

    jecUnc_->setJetEta( uncorrJet.eta() );
    jecUnc_->setJetPt( corr * uncorrJet.pt() );
    double corrUp = corr * (1 + fabs(jecUnc_->getUncertainty(1)));
    double corrDown = corr * ( 1 - fabs(jecUnc_->getUncertainty(-1)) );

    value_corr_jet_ptUp[value_jet_n] = corrUp * uncorrJet.pt();
    value_corr_jet_ptDown[value_jet_n] = corrDown * uncorrJet.pt();
   
  }
}

The uncertainties have several sources, shown in the figure below. The L1 (pileup) uncertainty dominates at low momentum, while the L3 (absolute scale) uncertainty takes over for higher momentum jets. All corrections are quite precise for jets located near the center of the CMS barrel region, and the precision drops as pseudorapidity increases and different subdetectors lose coverage.

Challenge: shifted histograms

Plot and investigate the range of momentum variation given by the JEC uncertainties. Is the difference between the raw and corrected momentum larger or smaller than the uncertainty? Use TTree::Draw to make histograms of the various momentum distributions. Ideally, show the up and down variations in different colors, and the raw vs corrected momenta with different line styles.

Solution

Draw a histogram, hover over one of the lines, and right click. You should see a menu appear – select “Set Line Attributes” and a GUI with pop up. This is handy for changing line colors and styles interactively. Over the bulk of the momentum distribution, the jet energy corrections are significant – the corrected and uncorrected versions are not consistent within the uncertainty. Drawing these plots is a typical sanity check in CMS analyses to test whether the uncertainties look sensible.

Key Points

  • Jet energy corrections are factorized and account for many mismeasurement effects

  • L1+L2+L3 should be applied to jets used for analyses, with residual corrections for data

  • Jet energy corrections are an source of systematic error and uncertainties should be evaluated