This lesson is being piloted (Beta version)

Vector-like quark analysis with ADL/CutLang: Part 1: Analysis algorithm, histograms, local runs

Overview

Teaching: 10 min
Exercises: 30 min
Questions
  • What is the general flow for the analysis?

  • What are the selection requirements?

  • How do I implement additional selection requirements with ADL?

  • How do I implement additional histograms with ADL?

  • How do I run this analysis with CutLang?

  • How do I produce plots comparing distribution shapes for signal(s) and background(s)?

Objectives
  • Understand the general strategy for the analysis

  • Learn how implement basic selection requirements in ADL

  • Learn how implement histograms in ADL

  • Run the analysis locally on CutLang on limited number of signal and background events.

  • Produce plots comparing distribution shapes for signal(s) and background(s) using PyROOT scripting and Jupyter.

Prerequisites

For this episode please make sure that you have installed the CutLang docker container with ROOT and VNC setup and that you have succesfully tested running CutLang in the container, as described in the previous episode.

Start the container and start VNC (password: cutlang-adl)

docker exec -it CutLang-root-vnc bash

start_vnc

Introducing the CMS vector-like quark analysis with 2015 data

We will perform an approximate reproduction of the following new physics search analysis performed with 2015 CMS data:

CMS-B2G-16-024: Search for pair production of vector-like T and B quarks in single-lepton final states using boosted jet substructure in proton-proton collisions at sqrt(s) = 13 TeV

arXiv link: (arXiv:1706.03408), publication reference: JHEP 11 (2017) 085 .

Download the paper, and glance through the abstract and introduction to have an idea about the model that the analysis is targeting and the final state that the analysis is exploring.

Here are several highlights from the analysis:

Now glance very quickly through the sections 4. Reconstruction methods, 5. Boosted H channels and 6. Boosted W channels to have a rough idea about what kinds of objects and event selections are employed. Here are several highlights.

ADL file for the analysis

CMS-B2G-16-024 is a very complex analysis with high numbers of object and event selections. The organized structure of ADL would be a good medium for explaining all this detail in a systematic, unambiguous and self-documenting manner. We have already written most of the analysis in ADL. But we will ask you to fill in some blanks. Let’s start by examining the ADL file.

Go to your CutLang docker container, and retrieve the ADL file with the following command

wget https://raw.githubusercontent.com/ADL4HEP/ADLAnalysisDrafts/main/CMS-B2G-16-024/CMS-B2G-16-024_step1.adl

This file contains almost all object definitions and a selection of signal search regions in the boosted H and boosted W channels.

Open CMS-B2G-16-024_step1.adl using nano or vi and explore the contents.

Modifying and running the ADL file

Let’s first run the ADL file as it is:

CLA root://eospublic.cern.ch//eos/opendata/cms/derived-data/POET/23-Jul-22/RunIIFall15MiniAODv2_TprimeTprime_M-800_TuneCUETP8M1_13TeV-madgraph-pythia8_flat.root POET -i CMS-B2G-16-024_step1.adl -e 20000

We are running over a signal sample which consists of TT production for T mass of 800 GeV.

In the output, you will see cutflows and efficiencies for all regions in the ADL file. You will also see the output ROOT file histoOut-CMS-B2G-16-024_step1.root.

Now let’s make some changes in the ADL file.

Challenge: Completing the object selections

Please resist the urge to look at the solution. Only compare with the proposed solution after you make your own attempt.

Please complete the object selection by adding the following cuts:

  • To the muonsH object, add muon pT > 47 and muon absolute value of eta < 2.1 cuts.
  • To the muonsWtight object, add muon pT > 40 and muon absolute value of eta < 2.4 cuts.
  • To the Hcands object, add an AK8 jet pT > 300 cut.

Solution

# muonsH - for boosted H regions
object muonsH
  take Muon
  select Medium(Muon) == 1 # cut based medium ID 
  select isolationVar(Muon) < 0.2
  select pT(Muon) > 47
  select abs(eta(Muon)) < 2.1

# muonsWtight - for boosted W regions
object muonsWtight
  take Muon
  select Tight(Muon) == 1 # cut based tight ID 
  select isolationVar(Muon) < 0.2
  select pT(Muon) > 40
  select Abs(Eta(Muon)) < 2.4

# Boosted Higgses
object Hcands
  take AK8jets
  select msoftdrop(AK8jets) [] 60 160
  select pT(AK8jets) > 300

Run CutLang again and check that your changes worked.

Challenge: Adding regions

The first paragraph on page 10 of the paper describes the signal search regions in the boosted W final state. 4 of these search regions (with 0 W-tag) are written. Can you write the other 4 with at least 1 W-tag?
Note that AK4jets requirement does not exist for the remaining regions that you will write. Also note that we do not make a distinction between electrons and muons, but only require the regions to have 1 lepton.

Solution

region boostedW5
  boostedW
  select size(Wjets) >= 1
  select size(bjetsW) == 0

region boostedW6
  boostedW
  select size(Wjets) >= 1
  select size(bjetsW) == 1

region boostedW7
  boostedW
  select size(Wjets) >= 1
  select size(bjetsW) == 2

region boostedW8
  boostedW
  select size(Wjets) >= 1
  select size(bjetsW) >= 3

You can retrieve the file with completed object and signal region selections as

wget https://raw.githubusercontent.com/ADL4HEP/ADLAnalysisDrafts/main/CMS-B2G-16-024/CMS-B2G-16-024_step2.adl

Now let’s make some histograms. In ADL, the histogram syntax is as follows:

Challenge: Adding histograms

  • Add a variable bin histogram of ST called hST with bins 750 875 1000 1125 1500 2000 2500 3000 4500 6500 in the boostedH1b and boostedH2b regions.
  • Add a fixed bin number of bjets histogram called hnbjets with binning 6, 0, 6 after the last cut in the boostedW region.
  • Add a fixed bin number of Wjets histogram called hnWjets with binning 4, 0, 4 after the last cut in the boostedW region.
  • To all boostedWN signal regions add either a minmlj or a minmlb histogram. If the region has a b-jet, add an minmlb histogram. Otherwise add a minmlj histogram. The binning should be 50, 0, 1000. Run the resulting ADL file and check the histograms.

Solution

wget https://raw.githubusercontent.com/ADL4HEP/ADLAnalysisDrafts/main/CMS-B2G-16-024/CMS-B2G-16-024_step3.adl

OPTIONAL: If you are familiar with analysis concepts, you can also try to add the control regions used for background estimation in the analysis.

Optional Challenge: Adding control regions

Background estimation and control region definitions for the boostedH and boostedW regions are described in Sections 5.2 and 6.2 of the paper, respectively. Please read the sections and write the control regions in ADL.

  • For each final state, there are control regions for tt+jets and W+jets backgrounds, which should be defined separately.
  • Note that control regions are defined by reverting the cuts on one or more variables defining the signal region preselections (e.g. boostedH and boostedW regions). Therefore the control regions should be independent. You can copy the boostedH and boostedW regions and change the cuts.
  • For the boostedW case, both control regions have subregions with different W multiplicity criteria. Also try to add histograms similar to those in the corresponding signal regions.

Run CutLang and check your results.

Solution

wget https://raw.githubusercontent.com/ADL4HEP/ADLAnalysisDrafts/main/CMS-B2G-16-024/CMS-B2G-16-024_step4.adl

The following graph shows how the analysis looks after step4. Red ellipses are the input objects, blue ellipses are the derived objects, and green rectangles are the regions. Blue arrows show object dependencies, green arrows show region dependencies, and gray arrows show which objects have been used in which region.

Actually, this graph was generated directly from the ADL file itself, using a graphviz application!

The complete analysis selection

Now get the final version of the ADL file:

wget https://raw.githubusercontent.com/ADL4HEP/ADLAnalysisDrafts/main/CMS-B2G-16-024/CMS-B2G-16-024_step5.adl

This version adds a few more histograms from the paper drawn using auxiliary objects and regions which are not a part of the actual analysis selection. Such histograms merely show object properties. For example, the histogram hmAK8jet2b shows the mass of AK8jets with only subjet b-tagging but no explicit mass cut. Study how that histogram was made. Similarly, check the hWjetsm and hWjetstau21 histograms.

Key Points

  • The domain-specific and declarative nature of ADL makes it easy to express and communicate complex and extensive analysis algorithms.

  • Basic selection requirements are implemented very easily using ADL.