Analyzing Open Data at Scale: Glossary

Key Points

Check access to TIFR (Jan 5)
  • Logging in to the TIFR cluster is easy for workshop participants!

Logistics of an Open Data analysis (Jan 10)
  • Computation for a CMS analysis is typically run in several steps: dataset skimming / flattening, observable creation, visualization, and statistical analysis.

  • Early steps such as dataset skimming and observable creation often particularly benefit from distributed computing.

  • The CMS Open Data Workshops provide tutorials for performing these analysis steps on either HTCondor or Google Cloud platforms.

HTCondor submission
  • The condor job control file can specify a docker container.

  • Each job’s executable file can specify code to access from Github to perform an analysis task.

  • References are included here to condor submission and monitoring guides.

Run your analysis
  • The hadd command allows you to merge ROOT files that have the same internal structure

  • Files can be extracted from TIFR to your local machine using scp

  • You can then analyze the POET ROOT files using other techniques from this workshop

Glossary

FIXME