CMS Data Flow

Overview

Teaching: 15 min
Exercises: 0 min
Questions
  • What does CMS do to process raw data into AOD and NanoAOD files?

Objectives
  • Review the flow of CMS data processing

CMS data follows a complex processing path after making it through the trigger selections you learned about in a previous lesson. The primary datasets begin with raw data events that have passed one or more of a certain set of triggers. For this workshop our test case is Higgs -> tau tau, so we analyze events in the “TauPlusX” primary dataset. Other datasets are defined for muon, electron, jet, MET, b-tagging, charmonium, and other trigger sets.

Skimming

Performing physics analyses with the raw primary datasets would be impossible! The datasets go through a processing of skimming (or also slimming, thinning, etc) to reduce their size and increase their usability – both by reducing the number of events and compressing the event format by removing information that is no longer needed. Data is processed through the following steps:

The figure below shows example for tracking, ECAL, and HCAL-based objects.

Storage & processing

As the data is being skimmed it is also moving around the world. All RAW data is stored on tape in two copies, and the the RECO and AOD files are stored at various “tier 1” and “tier 2” computing centers around the world. The major Tier 1 center for the USA is at Fermilab. Several “tier 3” computing centers exist at universities and labs to store user-derived data for physics analyses.

More information about the CMS data flow can be found in the public workbook.

Key Points

  • Compression and selections happen at most levels of CMS data processing