Getting real

Overview

Teaching: 5 min
Exercises: 20 min

Questions

How can I now use this for real?

Objectives

Get an idea of a full workflow with argo

So far we’ve run smaller examples, but now we have everything at hands to run physics analysis jobs in parallel.

Download the workflow with the following command:

curl -OL https://raw.githubusercontent.com/cms-opendata-workshop/workshop-payload-kubernetes/master/higgs-tau-tau-workflow.yaml

The file will look something like this:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: parallelism-nested-
spec:
  arguments:
    parameters:
    - name: files-list
      value: |
        [
          {"file": "GluGluToHToTauTau", "x-section": "19.6", "process": "ggH"},
          {"file": "VBF_HToTauTau", "x-section": "1.55", "process": "qqH"},
          {"file": "DYJetsToLL", "x-section": "3503.7", "process": "ZTT"},
          {"file": "TTbar", "x-section": "225.2", "process": "TT"},
          {"file": "W1JetsToLNu", "x-section": "6381.2", "process": "W1J"},
          {"file": "W2JetsToLNu", "x-section": "2039.8", "process": "W2J"},
          {"file": "W3JetsToLNu", "x-section": "612.5", "process": "W3J"},
          {"file": "Run2012B_TauPlusX", "x-section": "1.0", "process": "dataRunB"},
          {"file": "Run2012C_TauPlusX", "x-section": "1.0", "process": "dataRunC"}
        ]
    - name: histogram-list
      value: |
        [
          {"file": "GluGluToHToTauTau", "x-section": "19.6", "process": "ggH"},
          {"file": "VBF_HToTauTau", "x-section": "1.55", "process": "qqH"},
          {"file": "DYJetsToLL", "x-section": "3503.7", "process": "ZTT"},
          {"file": "DYJetsToLL", "x-section": "3503.7", "process": "ZLL"},
          {"file": "TTbar", "x-section": "225.2", "process": "TT"},
          {"file": "W1JetsToLNu", "x-section": "6381.2", "process": "W1J"},
          {"file": "W2JetsToLNu", "x-section": "2039.8", "process": "W2J"},
          {"file": "W3JetsToLNu", "x-section": "612.5", "process": "W3J"},
          {"file": "Run2012B_TauPlusX", "x-section": "1.0", "process": "dataRunB"},
          {"file": "Run2012C_TauPlusX", "x-section": "1.0", "process": "dataRunC"}
        ]
  entrypoint: parallel-worker
  volumes:
  - name: task-pv-storage
    persistentVolumeClaim:
      claimName: nfs-<NUMBER>
  templates:
  - name: parallel-worker
    inputs:
      parameters:
      - name: files-list
      - name: histogram-list
    dag:
      tasks:
      - name: skim-step
        template: skim-template
        arguments:
          parameters:
          - name: file
            value: ""
          - name: x-section
            value: ""
        withParam: ""
      - name: histogram-step
        dependencies: [skim-step]
        template: histogram-template
        arguments:
          parameters:
          - name: file
            value: ""
          - name: process
            value: ""
        withParam: ""

      - name: merge-step
        dependencies: [histogram-step]
        template: merge-template

      - name: plot-step
        dependencies: [merge-step]
        template: plot-template

      - name: fit-step
        dependencies: [merge-step]
        template: fit-template

  - name: skim-template
    inputs:
      parameters:
      - name: file
      - name: x-section
    script:
      image: gcr.io/cern-cms/root-conda-002:higgstautau
      command: [sh]
      source: |
        LUMI=11467.0 # Integrated luminosity of the unscaled dataset
        SCALE=0.1 # Same fraction as used to down-size the analysis
        mkdir -p $HOME/skimm
        ./skim root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/.root $HOME/skimm/-skimmed.root  $LUMI $SCALE
        ls -l $HOME/skimm
        cp $HOME/skimm/* /mnt/vol
      volumeMounts:
      - name: task-pv-storage
        mountPath: /mnt/vol
      resources:
        limits:
          memory: 2Gi
        requests:
          memory: 1.7Gi
          cpu: 750m

  - name: histogram-template
    inputs:
      parameters:
      - name: file
      - name: process
    script:
      image: gcr.io/cern-cms/root-conda-002:higgstautau
      command: [sh]
      source: |
        mkdir -p $HOME/histogram
        python histograms.py /mnt/vol/-skimmed.root  $HOME/histogram/-histogram-.root
        ls -l $HOME/histogram
        cp $HOME/histogram/* /mnt/vol
      volumeMounts:
      - name: task-pv-storage
        mountPath: /mnt/vol
      resources:
        limits:
          memory: 2Gi
        requests:
          memory: 1.7Gi
          cpu: 750m

  - name: merge-template
    script:
      image: gcr.io/cern-cms/root-conda-002:higgstautau
      command: [sh]
      source: |
        hadd -f /mnt/vol/histogram.root /mnt/vol/*-histogram-*.root
        ls -l /mnt/vol
      volumeMounts:
      - name: task-pv-storage
        mountPath: /mnt/vol
      resources:
        limits:
          memory: 2Gi
        requests:
          memory: 1.7Gi
          cpu: 750m

  - name: plot-template
    script:
      image: gcr.io/cern-cms/root-conda-002:higgstautau
      command: [sh]
      source: |
        SCALE=0.1
        python plot.py /mnt/vol/histogram.root /mnt/vol $SCALE
        ls -l /mnt/vol
      volumeMounts:
      - name: task-pv-storage
        mountPath: /mnt/vol

  - name: fit-template
    script:
      image: gcr.io/cern-cms/root-conda-002:higgstautau
      command: [sh]
      source: |
        python fit.py /mnt/vol/histogram.root /mnt/vol
        ls -l /mnt/vol
      volumeMounts:
      - name: task-pv-storage
        mountPath: /mnt/vol

Adjust the workflow as follows:

Replace gcr.io/cern-cms/root-conda-002:higgstautau in all places with the name of the image you created
Adjust claimName: nfs-<NUMBER>

Then execute the workflow and keep your thumbs pressed:

argo submit -n argo --watch higgs-tau-tau-workflow.yaml

Good luck!

Key Points

Argo is a powerful tool for running parallel workflows

previous episode

CMS Open Data using Kubernetes

next episode

Getting real

Overview

Key Points

previous episode

next episode