Dataset Scouting

This lesson is designed to teach you how to explore available CMS datasets on the CERN Open Data portal. You will find the primary datasets in which collision data were directed when the data were taken, and simulated Monte Carlo samples that are available for the run period you are interested in.

You’ll also be shown how to do a first-order inspection of some of these datafiles, just to see what is stored in them.

If you run into problems with any of these steps, please reach out to the organizers through the dedicated Mattermost channel.

Schedule

Setup Download files required for the lesson
00:00 1. Introduction What is the point of these exercises?
How do I find the data I want to work with?
00:10 2. Where are the datasets? Where do I find datasets for data and Monte Carlo?
00:20 3. What data and Monte Carlo are available? What data and run periods are available?
What data do the collision datasets contain?
What Monte Carlo samples are available?
00:35 4. How to access metadata on the command line? What is cernopendata-client?
How to use cernopendata-client container image?
How to get the list of files in a dataset on the command line?
00:45 5. What is in the datafiles? How do I inspect these files to see what is in them?
00:55 6. Hands-on activity How well do you understand what we covered?
01:10 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.