Advanced physics object concepts

Prep: Intro to corrections and uncertainties

Overview

Teaching: 20 min
Exercises: 0 min
Questions
  • How can simulation be used reliably in physics analyses?

  • How are algorithm efficiencies measured in data and simulation?

  • How can efficiency differences be corrected?

  • How do analysts handle uncertainties on corrections?

Objectives
  • Learn about common issues in matching simulation to data.

  • Learn some basic methods for computing algorithm efficiencies in CMS.

  • Learn to apply scale factors to correct simulation.

Essentially all CMS analyses rely on simulation. Whether to simulate a new physics signal or Standard Model processes, it is imperative that simulation reliably reproduces data. Each algorithm that is designed to select events or physics objects must be applied to both data and simulation, so there is always a risk of performance differing between the two. This is quantified by measuring the efficiency for certain objects to be selected by the algorithm. Differences in efficiency between data and simulation are corrected using scale factors.

CMS Analysts regularly evaluate the following algorithms for differences in efficiency between data and simulation:

What is efficiency?

Efficiency typically means (number of objects of type X passing algorithm Y) / (number of objects of type X)

Measuring efficiency in data

Measuring an algorithm’s efficiency is most complicated in data, because we want to study only “objects of type X”. This might mean “electrons”, “muons”, “b quark jets”, etc, and we have learned how it is difficult to precisely identify these objects in data. But we can bring in physics knowledge to help construct collections of certain particles. The common method of measuring lepton-related efficiencies in data is called tag and probe.

In CMS, tag and probe methods typically use Z boson or J/psi meson resonances that decay into 2 same-flavor leptons. One lepton serves as the “tag” and the other is the “probe”. An event selected for tag and probe has 2 main characteristics:

In events with these characteristics, we can be reasonably sure based on physics that the “probe” lepton is a “real” lepton, and in fact we can do even better by actually fitting the J/psi or Z mass peak to determine the number of events in our calculation. The probe lepton’s properties can then be used to determine the efficiency for essentially any lepton-related algorithm! For example, efficiency for the tight identification working point would be (number of tight probes in the Z peak / total number of probes in the Z peak).

The figure below shows an example of tag and probe using the Upsilon resonance to measure the global muon identification efficiency for muons with momentum between 6 and 20 GeV. Obviously the efficiency is quite high (as expected) since post probes pass the identification criteria. Tag and probe is the standard technique in CMS for measuring electron and muon efficiencies for all algorithms, typically scanning over various momentum and pseudorapidity ranges.

Measuring efficiency in simulation

In simulation, the tag and probe process can certainly be repeated by selecting events using the same criteria as for data. In order to correct the efficiency in simulation to match the efficiency in data it is best if they are computing using the same technique.

For some algorithms, such as b quark jet identification, efficiency for a specific simulated smaple can be measured directly using the truth information stored in the simulation. For b quark tagging, the relevant efficiencies are defined as:

These values are typically computed as functions of the momentum or pseudorapidity of the jet and are computed by analyzers for simulated samples with kinematics relevant to their analysis. In POET, a BTagging module has been provided for calculating efficiencies.

As an example, efficiencies for the CSV Medium working point in the top quark pair sample have been computed and stored in lookup function in PatJetAnalyzer.cc. The CSV algorithm has a b-quark efficiency of ~60% and a light quark mistag rate of ~1%, as advertised.

double PatJetAnalyzer::getBtagEfficiency(double pt){
  if(pt < 25) return 0.263407;
  else if(pt < 50) return 0.548796;
  else if(pt < 75) return 0.656801;
  ...etc...
  else if(pt < 400) return 0.625296;
  else return 0.394916;
}

double PatJetAnalyzer::getLFtagEfficiency(double pt){
  if(pt < 25) return 0.002394;
  else if(pt < 50) return 0.012683;
  else if(pt < 75) return 0.011459;
  ...etc...
  else if(pt < 400) return 0.014760;
  else return 0.011628;
}

Scale factors

Any differences between efficiencies in data and simulation should be corrected using scale factors.

What is a scale factor?

In a given bin of object momentum and pseudorapidity (or other criteria), the scale factor = (efficiency in data) / (efficiency in simulation).

Scale factors inherit the binning of the efficiency measurements, however they are made. For example, the figure below shows the efficiencies in data and simulation for tight muon identification in 2010 data. The corresponding scale factors would be the ratio of the black point to the red point in each bin.

Scale factors are typically applied to simulation using event weights. If a certain scale factor is 1.05, the number of simulated events in that bin should be increased by 5% by giving each individual event a weight of 1.05 instead of 1.

Uncertainties

All scale factors carry uncertainties. Statistical uncertainties arise from the limited number of events available to make the efficiency measurements. Systematic uncertaities can come from a variety of sources, such as:

CMS analysts apply uncertainties in two ways, either as rate uncertainties or shape uncertainties. Rate uncertainties are best for scale factors with very small uncertainties, or uncertainties that are essentially stable for the majority of events in an analysis. For lepton scale factors it is common to assign an uncertainty of 1-2% in all bins. For b tagging and many other scale factors is it more effective to use a shape uncertainty: the scale factor values are shifted up or down according to their uncertainties and 3 versions of the analysis histograms are constructed. An example is shown below for b quark tagging: the uncertainty on the scale factor can be mapped to an uncertainty on the b quark jet momentum using the shape uncertainty concept.

Later in the hands-on exercise and challenge we will practice applying scale factors and working with shape uncertainties.

Key Points

  • All CMS simulations must be corrected so that algorithm performance matches in data and simulation.

  • Efficiencies are measured in data using physics knowledge to isolate groups of common objects, such as Z boson tag-and-probe.

  • Efficiencies in simulation can also be measured using particle truth information.

  • Scale factors are ratios of efficiency in data to efficiency in simulation, and are applied using event weights.

  • All scale factor corrections carry either rate or shape uncertainties.


Demo: Muon corrections

Overview

Teaching: 12 min
Exercises: 0 min
Questions
  • How to correct biased muon momentum?

Objectives
  • Learn how to apply muon momentum scale corrections to data and MC.

There are misalignments in the CMS detector that make the reconstruction of muon momentum biased. The CMS reconstruction software does not fully correct these misalignments and additional corrections are needed to remove the bias. Correcting the misalignments is important when precision measurements are done using the muon momentum, because the bias in muon momentum will affect the results.

The Muon Momentum Scale Corrections

The Muon Momentum Scale Corrections, also known as the Rochester Corrections, are available in the MuonCorrectionsTool. The correction parameters have been extracted in a two step method. In the first step, initial corrections are obtained in bins of the charge of the muon and the η and ϕ coordinates of the muon track. The reconstruction bias in muon momentum depends on these variables. In the second step, the corrections are fine tuned using the mass of the Z boson.

The corrections for data and Monte Carlo (MC) are different since the MC events start with no biases but they can be induced during the reconstruction. Corrections have been extracted for both data and MC events.

In this example, the Run1 Rochester Corrections are used with a 2012 dataset and a MC dataset. The official code for the Rochester Corrections can be found in the directory RochesterCorrections. The example code for applying the corrections is in the Test directory.

Applying the corrections to data and MC

Let’s start by opening ROOT in a terminal and compiling the official corrections code by running the following lines.

cd MuonCorrectionsTool
root
.L RochesterCorrections/muresolution.cc++
.L RochesterCorrections/rochcor2012wasym.cc++

Then let’s take a look at Analysis.C which is where we apply the corrections.

The main function of Analysis.C is simply used for calling the applyCorrections function which takes as a parameter the name of the ROOT-file (without the .root-part), path to the ROOT-file and a boolean value of whether the file contains data (true) or MC (false). The data file is now commented because it would take too much time to run it during the workshop.

void Analysis::main()
{
  // Data
  // applyCorrections("Run2012BC_DoubleMuParked_Muons", "root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/Run2012BC_DoubleMuParked_Muons.root", true);

  // MC
  applyCorrections("ZZTo2e2mu", "root://eospublic.cern.ch//eos/opendata/cms/upload/stefan/HiggsToFourLeptonsNanoAODOutreachAnalysis/ZZTo2e2mu.root", false);
}

The first thing applyCorrections does is create a TTree from the ROOT-file. Then variables for holding the values read from the tree are created and branch addresses are set so that the variables are populated when looping over events. New branches for the corrected values, an output file and a few variables needed for the corrections are also created.

int applyCorrections(string filename, string pathToFile, bool isData) {
  // Create TTree from ROOT file
  TFile *f1 = TFile::Open((pathToFile).c_str());
  TTree *DataTree = (TTree*)f1->Get("Events");

Next, we loop over the events and apply the corrections. We are later going to do an exercise to make sure the corrections have been applied correctly. We are going to need the invariant mass of μ+μ-, which is why the events are filtered to muon pairs with opposite charges and the invariant mass is computed.

// Loop over events
  Int_t nEntries = (Int_t)DataTree->GetEntries();

  for (Int_t k=0; k<nEntries; k++) {
    DataTree->GetEntry(k);

    // Select events with exactly two muons
    if (nMuon == 2 ) {
      // Select events with two muons of opposite charge
      if (Muon_charge[0] != Muon_charge[1]) {

        // Compute invariant mass of the dimuon system
        Dimuon_mass = computeInvariantMass(Muon_pt[0], Muon_pt[1], Muon_eta[0], Muon_eta[1], Muon_phi[0], Muon_phi[1], Muon_mass[0], Muon_mass[1]);
        bDimuon_mass->Fill();

We then loop over the muons in an event and create a TLorentzVector for each muon. The functions for applying the Rochester Corrections take as a parameter a TLorentzVector which is a four-vector that describes the muons momentum and energy. As mentioned earlier, the muon momentum scale corrections are different for data and MC and therefore there are separate functions for both: momcor_data and momcor_mc. These functions can be found in rochcor2012wasym.cc if you want to take a closer look at them.

        // Loop over muons in event
        for (UInt_t i=0; i<nMuon; i++) {

          // Fill positive and negative muons eta branches
          if (Muon_charge[i] > 0) {
            Muon_eta_pos[i] = Muon_eta[i];
            bMuon_eta_pos->Fill();
          } else {
            Muon_eta_neg[i] = Muon_eta[i];
            bMuon_eta_neg->Fill();
          }

          // Create TLorentzVector
          TLorentzVector mu;
          mu.SetPtEtaPhiM(Muon_pt[i], Muon_eta[i], Muon_phi[i], Muon_mass[i]);

          // Apply the corrections
          if (isData) {
            rmcor.momcor_data(mu, Muon_charge[i], runopt, qter);
          } else {
            rmcor.momcor_mc(mu, Muon_charge[i], ntrk, qter);
          }

The corrected values are then saved to the new branches and the corrected invariant mass is computed. The new tree is filled and written to the output file.

          // Save corrected values
          Muon_pt_cor[i] = mu.Pt();
          bMuon_pt_cor->Fill();
          Muon_eta_cor[i] = mu.Eta();
          bMuon_eta_cor->Fill();
          Muon_phi_cor[i] = mu.Phi();
          bMuon_phi_cor->Fill();
          Muon_mass_cor[i] = mu.M();
          bMuon_mass_cor->Fill();
        }

        // Compute invariant mass of the corrected dimuon system
        Dimuon_mass_cor = computeInvariantMass(Muon_pt_cor[0], Muon_pt_cor[1], Muon_eta_cor[0], Muon_eta_cor[1], Muon_phi_cor[0], Muon_phi_cor[1], Muon_mass_cor[0], Muon_mass_cor[1]);
        bDimuon_mass_cor->Fill();

      }
    }
    //Fill the corrected values to the new tree
    DataTreeCor->Fill();
  }
  
  //Save the new tree
  DataTreeCor->Write();

Compile and run Analysis.C by running the following lines in your ROOT-terminal. You will get a bunch of warnings but you can safely ignore them.

.L RochesterCorrections/Test/Analysis.C+
Analysis pf
pf.main()

And that’s it! As a result you can find ZZTo2e2mu_Cor.root in the Test directory. The file contains both corrected and uncorrected values.

Key Points

  • Rochester corrections are used to scale the muon momentum so that simulation better matches data.


Demo: Heavy flavor tagging

Overview

Teaching: 12 min min
Exercises: min
Questions
  • How are b hadrons identified in CMS

Objectives
  • Understand the basics of heavy flavor tagging

  • Learn to access tagging information in AOD files

Jet reconstruction and identification is an important part of the analyses at the LHC. A jet may contain the hadronization products of any quark or gluon, or possibly the decay products of more massive particles such as W or Higgs bosons. Several “b tagging” algorithms exist to identify jets from the hadronization of b quarks, which have unique properties that distinguish them from light quark or gluon jets.

B Tagging Algorithms

Tagging algorithms first connect the jets with good quality tracks that are either associated with one of the jet’s particle flow candidates or within a nearby cone. Both tracks and “secondary vertices” (track vertices from the decays of b hadrons) can be used in track-based, vertex-based, or “combined” tagging algorithms. The specific details depend upon the algorithm use. However, they all exploit properties of b hadrons such as:

Tagging algorithms are Algorithms that are used for b-tagging:

These algorithms produce a single, real number (often the output of an MVA) called a b tagging “discriminator” for each jet. The more positive the discriminator value, the more likely it is that this jet contained b hadrons.

Accessing tagging information

In PatJetAnalyzer.cc we access the information from the Combined Secondary Vertex (CSV) b tagging algorithm and associate discriminator values with the jets. The CSV values are stored in a separate collection in the AOD files called a JetTagCollection, which is effectively a vector of associations between jet references and float values (such as a b-tagging discriminator). During the creation of pat::Jet objects, this information is merged into the jets so that the discriminators can be accessed with a dedicated member function (similar merging is done for other useful associations like generated jets and jet flavor in simulation!).

For examples of working with b tags and RECO jets, see last year’s lesson.

#include "DataFormats/PatCandidates/interface/Jet.h"

Handle<std::vector<pat::Jet>> myjets;
iEvent.getByLabel(jetInput, myjets); // jetInput opens "selectedPatJetsAK5PFCorr"

for (std::vector<pat::Jet>::const_iterator itjet=myjets->begin(); itjet!=myjets->end(); ++itjet){

  jet_btag.push_back(itjet->bDiscriminator("combinedSecondaryVertexBJetTags"));

}

We can investigate at the b tag information in a dataset using ROOT.

$ root -l myoutput.root // you produced this earlier in hands-on exercise #2
[0] _file0->cd("myjets");
[1] Events->Draw("jet_btag");

The discriminator values for jets with kinematics in the correct range for this algorithm lie between 0 and 1. Other jets that do not have a valid discriminator value pick up values of -1, -9, or -999.

Working points

A jet is considered “b tagged” if the discriminator value exceeds some threshold. Different thresholds will have different efficiencies for identifying true b quark jets and for mis-tagging light quark jets. As we saw for muons and other objects, a “loose” working point will allow the highest mis-tagging rate, while a “tight” working point will sacrifice some correct-tag efficiency to reduce mis-tagging. The CSV algorithm has working points defined based on mis-tagging rate:

The method used to calculate these efficiencies was described in the introductory episode. The figures below show examples of the b and light quark efficiencies and scale factors as a function of jet momentum for the CSV Medium working point (read more).

b efficiency light efficiency

Applying scale factors

Scale factors to increase or decrease the number of b-tagged jets in simulation can be applied in a number of ways, but typically involve weighting simulation events based on the efficiencies and scale factors relevant to each jet in the event. Scale factors for the CSV algorithm are available for Open Data and involve extracting functions from a comma-separated-values file. Details and usage reference can be found at these references:

In POET, the scale factor functions in the comma-separated-values file for the Medium working point have been implemented in PatJetAnalyzer.cc. The scale factor is a single numerical value calculated from a function of the corrected jet momentum:

double PatJetAnalyzer::getBorCtagSF(double pt, double eta){
  if (pt > 670.) pt = 670;
  if(fabs(eta) > 2.4 or pt<20.) return 1.0;

  return 0.92955*((1.+(0.0589629*pt))/(1.+(0.0568063*pt)));
}

The most commonly used method to apply scale factors is Method 1a, and it has been implemented in POET. This is an event weight method of applying a correction, so the final result is an event weight to be stored for later in the analysis, along with extra event weights to describe the uncertainty.

The method relies on 4 pieces of information for each jet in simulation:

Each jet then contributes to the event weight based on these four features. For example, jets that pass the discriminator threshold and are b quark jets contribute the following factor to variables called MC, which appears as P(MC) in the reference TWiki, and btagWeight, which appears as P(DATA) in the reference TWiki.

hadronFlavour = itjet->partonFlavour();
corrpt = corr_jet_pt.at(value_jet_n);

if (jet_btag.at(value_jet_n) > 0.679){
  if(abs(hadronFlavour) == 5){
    eff = getBtagEfficiency(corrpt);
    SF = getBorCtagSF(corrpt, jet_eta.at(value_jet_n));
    SFu = SF + uncertaintyForBTagSF(corrpt, jet_eta.at(value_jet_n));
    SFd = SF - uncertaintyForBTagSF(corrpt, jet_eta.at(value_jet_n));
  }
  // ... repeat for c and light quark jets...
  MC *= eff;
  btagWeight *= SF * eff;
  btagWeightUp *= SFu * eff;
  btagWeightDn *= SFd * eff;
}
//...similar for non-tagged jets...

After the jet loop, the reference TWiki defines the weight as P(DATA)/P(MC):

btagWeight = (btagWeight/MC);
btagWeightUp = (btagWeightUp/MC);
btagWeightDn = (btagWeightDn/MC);

The weight and its uncertainties are all stored in the tree for use later in filling histograms.

Key Points

  • Tagging algorithms separate heavy flavor jets from jets produced by the hadronization of light quarks and gluons

  • Tagging algorithms produce a disriminator value for each jet that represents the likelihood that the jet came from a b hadron

  • Each tagging algorithm has recommended ‘working points’ (discriminator values) based on a misidentification probability for light-flavor jets


Demo: Jet corrections

Overview

Teaching: 12 min
Exercises: 0 min
Questions
  • How are data/simulation differences dealt with for jet energy?

Objectives
  • Learn about typical differences in jet energy scale and resolution between data and simulation

  • Learn how JEC and JER corrections can be applied to OpenData jets

  • Access uncertainties in both JEC and JER

Unsurprisingly, the CMS detector does not measure jet energies perfectly, nor do simulation and data agree perfectly! The measured energy of jet must be corrected so that it can be related to the true energy of its parent particle. These corrections account for several effects and are factorized so that each effect can be studied independently. All of the corrections in this section are described in “Jet Energy Scale and Resolution” papers by CMS:

Correction levels

Particles from additional interactions in nearby bunch crossings of the LHC contribute energy in the calorimeters that must somehow be distinguished from the energy deposits of the main interaction. Extra energy in a jet’s cone can make its measured momentum larger than the momentum of the parent particle. The first layer (“L1”) of jet energy corrections accounts for pileup by subtracting the average transverse momentum contribution of the pileup interactions to the jet’s cone area. This average pileup contribution varies by pseudorapidity and, of course, by the number of interactions in the event.

The second and third layers of corrections (“L2L3”) correct the measured momentum to the true momentum as functions of momentum and pseudorapidity, bringing the reconstructed jet in line with the generated jet. These corrections are derived using momentum balancing and missing energy techniques in dijet and Z boson events. One well-measured object (ex: a jet near the center of the detector, a Z boson reconstructed from leptons) is balanced against a jet for which corrections are derived.

All of these corrections are applied to both data and simulation. Data events are then given “residual” corrections to bring data into line with the corrected simulation. A final set of flavor-based corrections are used in certain analyses that are especially sensitive to flavor effects. The figure below shows the result of the L1+L2+L3 corrections on the jet response.

Jet Energy Resolution

Jet Energy Resolution (JER) corrections are applied after JEC on strictly MC simulations. Unlike JEC, which adjusts the mean of the momentun response distribution, JER adjusts the width of the distribution. The ratio of reconstructed transverse momentum to true (generated) transverse momentum forms a Gaussian distributions – the width of this Gaussian is the JER. In data, where no “true” pT is available, the JER is measured using photon/Z + jet events where the jet recoils against the photon or Z boson, both of which can be measured quite precisely in the CMS detector. The JER is typically smaller in simulation than in data, leading to scale factors that are larger than 1. These scale factors, along with their uncertainties, are accessible in POET in the jet analyzers. They are applied using two methods:

Applying JEC and JER

The jet energy corrections and Type-1 MET corrections can be applied to RECO jets while making PAT jets. To do this we will use the ‘addJetCollection’ process in python/poet_cfg.py. You can see that simulation and data use slightly different lists of correction levels as we discussed.

# Choose which jet correction levels to apply
jetcorrlabels = ['L1FastJet','L2Relative','L3Absolute']
if isData:
  # For data we need to remove generator-level matching processes
  runOnData(process, ['Jets','METs'], "", None, [])
  jetcorrlabels.append('L2L3Residual')

# Configure the addJetCollection tool
# This process will make corrected jets with b-tagging included, and will make Type1-corrected MET
process.ak5PFJets.doAreaFastjet = True
addJetCollection(process,cms.InputTag('ak5PFJets'),
    'AK5', 'PFCorr',
    doJTA        = True,
    doBTagging   = True, 
    jetCorrLabel = ('AK5PF', cms.vstring(jetcorrlabels)),
    doType1MET   = True,
    doL1Cleaning = False,
    doL1Counters = False,
    doJetID      = True,
    jetIdLabel   = "ak5",
    ) 

In PatJetAnalyzer.cc the corrected momenta and energy come for free! For extra information we can access the uncorrected jet and store some of its features as well.

for (std::vector<pat::Jet>::const_iterator itjet=myjets->begin(); itjet!=myjets->end(); ++itjet){
  pat::Jet uncorrJet = itjet->correctedJet(0); 

  // itjet is the corrected jet and uncorrJet is the raw jet
}

The JER corrections must be accessed from a text file and applied on top of the corrected jets. The text file is read in using the SimpleJetCorrector class which provides a correction() member function that will evaluate the formula provided in the text file given the necessary variables (in this case: momentum, pseudorapidity, and number of pileup vertices). In the constructor function PatJetAnalyzer::PatJetAnalyzer:

jerResName_ = iConfig.getParameter<edm::FileInPath>("jerResName").fullPath(); // JER Resolutions                               

JetCorrectorParameters *ak5PFPar = new JetCorrectorParameters(jerResName_);
jer_ = boost::shared_ptr<SimpleJetCorrector>( new SimpleJetCorrector(*ak5PFPar) );

We can now use jer_->correction() to access the jet’s momentum resolution in simulation. In the code snippet below you can see “scaling” and “smearing” versions of applying the JER scale factors, depending on whether or not this pat::Jet had a matched generated jet. The random smearing application uses a reproducible seed for the random number generator based on the inherently random azimuthal angle of the jet.

Note: this code snippet is simplified by removing lines for handling uncertainties – that’s coming below!

for (std::vector<pat::Jet>::const_iterator itjet=myjets->begin(); itjet!=myjets->end(); ++itjet){
  pat::Jet uncorrJet = itjet->correctedJet(0); 

  ptscale = 1;
  res = 1;
  if(!isData) {
    std::vector<float> factors = factorLookup(fabs(itjet->eta())); // returns in order {factor, factor_down, factor_up}
    std::vector<float> feta;
    std::vector<float> PTNPU;
    feta.push_back( fabs(itjet->eta()) );
    PTNPU.push_back( itjet->pt() );
    PTNPU.push_back( vertices->size() );
 
    res = jer_->correction(feta, PTNPU);

    float reco_pt = itjet->pt();
    const reco::GenJet *genJet = itjet->genJet();
    bool smeared = false;

    // Attempt the truth-vs-reco smearing
    if(genJet){
      double deltaPt = fabs(genJet->pt() - reco_pt);
      double deltaR = reco::deltaR(genJet->p4(),itjet->p4());
      if ((deltaR < 0.25) && deltaPt <= 2*reco_pt*res){
        double ptratio = reco_pt/genJet->pt();
        ptscale = max(0.0, ptratio + factors[0]*(1 - ptratio));
        smeared = true;
      }
    }
    // If that didn't work, use Gaussian smearing with a reproducible seed
    if (!smeared && factors[0] > 1) {
      TRandom3 JERrand;
   
      JERrand.SetSeed(abs(static_cast<int>(itjet->phi()*1e4)));
      ptscale = max(0.0, 1.0 + JERrand.Gaus(0,res)*sqrt(factors[0]*factors[0] - 1.0));
    }
  }
}     

The final, definitive jet momentum is the raw momentum multiplied by both JEC and JER corrections! After computing ptscale, we store a variety of corrected and uncorrected kinematic variables for jets passing the momentum threshold:

if( ptscale*itjet->pt() <= min_pt) continue;

jet_pt.push_back(uncorrJet.pt());
// ...other variables...
corr_jet_pt.push_back(ptscale*itjet->pt());

Uncertainties

You will have noticed that nested among the JEC and JER code snippets given above were commands related to the uncertainty in these corrections. The JEC uncertainties have several sources, shown in the figure below. The L1 (pileup) uncertainty dominates at low momentum, while the L3 (absolute scale) uncertainty takes over for higher momentum jets. All corrections are quite precise for jets located near the center of the CMS barrel region, and the precision drops as pseudorapidity increases and different subdetectors lose coverage.

The JER uncertainty is evaluated by shifting the scale factors up and down according to the error bars shown in the scale factor figure above. These uncertainties arise from treatment of initial and final state radiation in the data measurement, differences in Monte Carlo tunes across generator platforms, and small non-Gaussian tail effects.

The JEC uncertainty text file is loaded as a JetCorrectionUncertainty object in the PatJetAnalyzer constructor:

jecUncName_ = iConfig.getParameter<edm::FileInPath>("jecUncName").fullPath(); // JEC uncertainties

jecUnc_ = boost::shared_ptr<JetCorrectionUncertainty>( new JetCorrectionUncertainty(jecUncName_) );

This object provides a getUncertainty() function that takes in the jet’s momentum and pseudorapidity and returns an adjustment to the JEC correction factor:

for (std::vector<pat::Jet>::const_iterator itjet=myjets->begin(); itjet!=myjets->end(); ++itjet){
  pat::Jet uncorrJet = itjet->correctedJet(0);     
      
  double corrUp = 1.0;
  double corrDown = 1.0;
  jecUnc_->setJetEta( itjet->eta() );
  jecUnc_->setJetPt( itjet->pt() );
  corrUp = (1 + fabs(jecUnc_->getUncertainty(1))); 
  jecUnc_->setJetEta( itjet->eta() );
  jecUnc_->setJetPt( itjet->pt() );
  corrDown = (1 - fabs(jecUnc_->getUncertainty(-1)));

The JER uncertainty is evaluated by calculating a ptscale_up and ptscale_down correction, exactly as shown above for the ptscale correction, but using the shifted scale factors. The uncertainties in JEC and JER are kept separate from each other: when varying JEC, the JER correction is held constant, and vice versa. This results in 5 momentum branches: a central value and two sets of uncertainties:

corr_jet_pt.push_back(ptscale*itjet->pt());
corr_jet_ptUp.push_back(ptscale*corrUp*itjet->pt());
corr_jet_ptDown.push_back(ptscale*corrDown*itjet->pt());
corr_jet_ptSmearUp.push_back(ptscale_up*itjet->pt());
corr_jet_ptSmearDown.push_back(ptscale_down*itjet->pt());

Key Points

  • Jet energy corrections are factorized and account for many mismeasurement effects

  • L1+L2+L3 should be applied to jets used for analyses, with residual corrections for data

  • Jet energy resolution in simulation is typically too narrow and is smeared using scale factors

  • Jet energy and resolution corrections are sources of systematic error and uncertainties should be evaluated


15 minute break

Overview

Teaching: 0 min
Exercises: 15 min
Questions
  • Which type of coffee will you drink on your break?

Objectives
  • Acquire coffee. Drink coffee.

Take a break!

Key Points

  • Any type of coffee is refreshing after so much concentrated learning.


Advanced objects hands-on

Overview

Teaching: 0 min
Exercises: 40 min
Questions
  • How can I visualize the effect of the muon corrections?

  • How do I access other b tagging discriminators?

  • How do I apply the b tagging scale factors and visualize the uncertainty?

Objectives
  • Practice plotting histograms in ROOT to view POET histograms.

  • Implement multiple b taggers and count the number of b-tagged jets

  • Apply the central and shifted b tagging scale factors

Choose your exercise! Complete exercise #1 on the muon corrections, and then choose one of the options for exercise #2 on b-tagging.

Exercise 1: results of the muon correction

Check that the corrections have been applied correctly by plotting the corrected and uncorrected invariant mass of μ+μ- on the ROOT command line. If everything went right with the corrections, there should be a difference in the invariant masses. Cut the x-axis to be from 84 to 98.

Hint: You can find the branch names of the invariant masses from Analysis.C.

Solution

$ root RochesterCorrections/Test/ZZTo2e2mu_Cor.root
[0] Events->Draw("Dimuon_mass", "Dimuon_mass > 84 && Dimuon_mass < 98")
[1] Events->Draw("Dimuon_mass_cor", "Dimuon_mass > 84 && Dimuon_mass < 98", "hist same"); htemp->SetLineColor(2);

You should get something like this:

An example of what the plot looks like for data:

Exercise 2 option A: alternate b taggers

The statements printed from addJetCollection when running poet_cfg.py shows the options for strings that can be used in the bdiscriminator() function:

The bdiscriminators below will be written to the jet collection in the PATtuple (default is all, see PatAlgos/PhysicsTools/python/tools/jetTools.py)
jetBProbabilityBJetTags
jetProbabilityBJetTags
trackCountingHighPurBJetTags
trackCountingHighEffBJetTags
simpleSecondaryVertexHighEffBJetTags
simpleSecondaryVertexHighPurBJetTags
combinedSecondaryVertexBJetTags

Add 1-2 new branches for alternate taggers and compare those discriminants to CSV.

After editing PatJetAnalyzer.cc,

$ scram b
$ cmsRun python/poet_cfg.py False True
$ root -l myoutput.root
[0] TBrowser b

Solution

Let’s add the high purity track counting tagger, which was the most common tagger in 2011. After adding a new array declaration and branch in the top sections of PatJetAnalyzer.cc, we can call bDiscriminator() for these alternate tagger:

// Declarations
std::vector<double> jet_btag;
std::vector<double> jet_btagTC;

// Branches
mtree->Branch("jet_btag",&jet_btag);
mtree->Branch("jet_btagTC",&jet_btagTC);

// inside the jet loop
 jet_btag.push_back(itjet->bDiscriminator("combinedSecondaryVertexBJetTags"));
 jet_btag.push_back(itjet->bDiscriminator("trackCountingHighPurBJetTags"));

The distributions in ttbar events (excluding events with null values) look remarkably different! The CSV algorithm is a typical “multivariate” discriminant that ranges from 0 to 1 while the track counting discriminant represents a physics-based impact parameter quantity that has a much wider range of values. More information about the various taggers can be found here.

Exercise 2 option B: count medium CSV b tags

Calculate the number of jets per event that are b tagged according to the medium working point of the CSV algorithm. Store a branch called jet_nCSVM and draw it 3 times, applying the scale factor weights and demonstrating the uncertainty.

After editing PatJetAnalyzer.cc,

$ scram b
$ cmsRun python/poet_cfg.py False True
$ root -l myoutput.root
[0] _file0->cd("myjets")
[1] Events->Draw("jet_nCSVM","(btagWeight)")
[2] Events->Draw("jet_nCSVM","(btagWeightUp)","p hist same")
[3] Events->Draw("jet_nCSVM","(btagWeightDown)","p hist same")

Solution

We count the number of “Medium CSV” b-tagged jets by summing up the number of jets with discriminant values greater than 0.679. After adding a variable declaration and branch we can sum up the counter:

int jet_nCSVM = 0;
for (std::vector<pat::Jet>::const_iterator itjet=myjets->begin(); itjet!=myjets->end(); ++itjet){

 // ... JEC uncert, JER, storing variables...

 jet_btag.push_back(itjet->bDiscriminator("combinedSecondaryVertexBJetTags"));
 if (itjet->bDiscriminator("combinedSecondaryVertexBJetTags") > 0.679) jet_nCSVM++;
}

Applying the scale factor weight shifts the mean of the distribution down slightly. The uncertainty wraps around the central distribution with a magnitude of approximately 6% in this sample.

Key Points

  • Muon corrections are a precision adjustment to a dimuon mass histogram.

  • B tagging scale factors and uncertainties should be considered for any distribution that relies on b-tagged jets


Offline: Advanced object challenge

Overview

Teaching: 0 min
Exercises: 30 min
Questions
  • How do uncorrected and corrected jet momenta compare?

  • How large is the JEC uncertainty in different regions?

  • How large is the JER uncertainty in different regions?

Objectives
  • Explore the JEC and JER uncertainties using histograms

  • Practice basic plotting with ROOT

Let’s dig into the JEC and JER corrections and uncertainties by studying their effect on jet momentum histograms.

Run POET

Take some time to run POET using the entire top quark pair test file. In python/poet_cfg.py set the number of events to process to -1:

#---- Select the maximum number of events to process (if -1, run over all events)
process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(-1) )

And run POET using cmsRun python/poet_cfg.py False True. Great time to read ahead on tomorrow’s lessons!

Open myoutput.root and investigate the range of momentum variation given by the JEC uncertainties by plotting:

Questions:

Is the difference between the raw and corrected momentum larger or smaller than the uncertainty? Which uncertainty dominates?

Repeat these plots with the additional requirement that the jets be “forward” (abs(jet_eta) > 3.0) How do the magnitudes of the uncertainties compare in this region?

Recall: to draw plots on the ROOT command line use the following syntax:

$ root -l myoutput.root
[0] _file0->cd("myjets");
[1] Events->Draw("jet_pt"); // draw the branch jet_pt
[2] Events->Draw("jet_pt","abs(jet_eta) > 3.0","same"); // draw jet_pt with a cut on jet_eta, on top of the previous plot
[3] Events->Draw("jet_pt","abs(jet_eta) < 3.0","pe same"); // draw jet_pt again with a different cut using points & error bars

To use different line colors and styles in a plot, “right-click” on a histograms and use the “Set Line Attributes” GUI to change the color and style of the lines and markers. Clicking on a different histogram will automatically change which plot’s style the GUI is controlling.

Solutions

Come back tomorrow to go over these plots!

Key Points

  • Come back for the solutions session tomorrow!


Come back tomorrow morning!

Overview

Teaching: min
Exercises: min
Questions
Objectives

The solutions session for this lesson is schedule for Wednesday at 14:30 CERN time. See you then!

Key Points


Solutions and questions

Overview

Teaching: 40 min
Exercises: 0 min
Questions
  • How do uncorrected and corrected jet momenta compare?

  • How large is the JEC uncertainty in different regions?

  • How large is the JER uncertainty in different regions?

Objectives
  • Explore the JEC and JER uncertainties using histograms

  • Practice basic plotting with ROOT

Open myoutput.root and investigate the range of momentum variation given by the JEC uncertainties by plotting:

Questions:

Is the difference between the raw and corrected momentum larger or smaller than the uncertainty? Which uncertainty dominates?

Solution

The following plotting commands can be used to draw the four histograms needed to answer the first question:

$ root -l myoutput.root
[1] _file0->cd("myjets");
[2] Events->Draw("jet_pt");
[3] Events->Draw("corr_jet_pt","","hist same");
[4] Events->Draw("corr_jet_ptUp","","hist same");
[5] Events->Draw("corr_jet_ptDown","","hist same");

We can see that the corrections are significant, far larger than the uncertainty itself. The first level of correction, for pileup removal, tends to reduce the momentum of the jet. The JER uncertainty can be drawn using similar commands:

This uncertainty is much smaller for the majority of jets! The JER correction is similar to the muon Rochester corrections in that it is most important for analyses requiring higher precision in jet agreement between data and simulation.

Repeat these plots with the additional requirement that the jets be “forward” (abs(jet_eta) > 3.0) How do the magnitudes of the uncertainties compare in this region?

Solution

This time we need to apply cuts to the jets as we draw:

$ root -l myoutput.root
[1] _file0->cd("myjets");
[2] Events->Draw("jet_pt","abs(jet_eta) > 3.0");
[3] Events->Draw("corr_jet_pt","abs(jet_eta) > 3.0","hist same");
[4] Events->Draw("corr_jet_ptUp","abs(jet_eta) > 3.0","hist same");
[5] Events->Draw("corr_jet_ptDown","abs(jet_eta) > 3.0","hist same");

In the endcap region the uncertainty on the JER scale factor has become nearly 20%! So this uncertainty gains almost equal footing with JEC. Many CMS analyses restrict themselves to studying jets in the “central” region of the detector, defined loosely by the tracker acceptance region of abs(eta) < 2.4 precisely to avoid these larger JEC and JER uncertainties.

Key Points

  • In general, the jet corrections are significant and lower the momenta of the jets with standard LHC pileup conditions.

  • For most jets, the JEC uncertainty dominates over the JER uncertainty.

  • In the endcap region of the detector, the JER uncertainty in larger and matches the JEC uncertainty.