CMS Open Data Workshop 2023


July 11-14, 2023

8:00 am - 12:00 am (CDT, UTC-5)

Instructors: M. Bellis, J. Hogan, K. Lassila-Perini, T. McCauley, S. Sekmen, X. Tintin, R. Trujillo

Helpers: V. Morina, J. Nelson, N. Pederson, X. Shen

General Information

Since 2014, the CMS Collaboration has pioneered the release of LHC research quality data for public use by making a significant amount of these data accessible through the CERN Open Data portal. At the end of 2021, the CMS Collaboration released the first batch of its Run 2 data. This workshop is a fourth of a series that started in 2020 and it aims to bridge the technical gap that usually exists between the scientific creativity of an external analyst and the nuts-and-bolts details of a full analysis with CMS open data. All exercises will be hands-on and participants should be prepared to dive into the data right away. A set of pre-exercises and assignments are provided and required for participants so that they can make the most of the workshop. Time will also be spent brainstorming with attendees about how the entire process of accessing and analyzing the data could be made more useful for the broader HEP community.

Who: This workshop is primarily aimed at students and scientists with prior knowledge of collider physics and a deep interest in learning the works and arts of conducting experimental analysis using CMS Open Data. </strong>

Where: Fermilab, Wilson Hall.

When: July 11-14, 2023. Add to your Google Calendar. [US Central Daylight Time (CDT)]

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) on which they have administrative privileges. They should have a working virtual machine or Docker container environment as listed in the pre-exercises section.

Accessibility: We are dedicated to providing a positive and accessible learning environment for all. Please notify the instructors in advance of the workshop if you require any accommodations or if there is anything we can do to make this workshop more accessible to you.

Code of Conduct

All workshop participants are expected to follow the CERN Code of Conduct.



Completing the required pre-exercises makes full participation in the workshop possible! Submit homework responses as you complete the pre-exercises.
Mandatory 5 minOrientation
Optional (external lesson)The Unix Shell
Optional (external lesson)Version Control with Git
Optional (external lesson)Programming with Python
Mandatory 2hDocker containers
Mandatory 4hROOT with C++ and Python
Mandatory 2hIntro to CMSSW
Mandatory 2hIntro to cloud computing


The following concepts are important for using Open Data independently, but will not be taught live during the workshop.
30mOverview of the CMS detector
2hDataset scouting
2hPhysics Objects


8:00-8:20Welcome and IntroductionsMatt Bellis
8:20-9:20How do you take your cup of CMS Open Data?Julie Hogan
Rikab Gambhir
9:40-10:20Event selection introductionJulie Hogan
10:20-11:00Event selection explorationJulie Hogan
11:20-12:00Event selection discussionJulie Hogan
OfflineAdvanced tools intro reading
OfflineAnalysis example intro reading


8:00-8:40Advanced tools explorationJulie Hogan
8:40-9:20Advanced tools activityJulie Hogan
9:40-10:20Advanced tools discussionJulie Hogan
10:20-11:00Analysis example introductionMatt Bellis
11:20-12:00Analysis example explorationMatt Bellis
OfflineAnalysis example challenge
OfflineCloud processing setup (Required!)


8:00-8:40Analysis example solutionsMatt Bellis
8:40-9:20Analysis example discussionMatt Bellis
9:40-10:20Cloud processing introductionRomina Trujillo Carrera
10:20-11:00Cloud processing demonstrationXavier Tintin
11:20-12:00Cloud processing activityXavier Tintin
OfflineReinterpretation intro reading


8:00-8:40Cloud processing challengeXavier Tintin
8:40-9:20Cloud processing discussionKati Lassila-Perini
9:40-10:20Reinterpretation introductionSezen Sekmen
10:20-11:00Reinterpretation explorationSezen Sekmen
11:20-12:00Closing discussion and feedbackKati Lassila-Perini