Getting real

Overview

Teaching: 5 min
Exercises: 20 min
Questions
  • How can I now use this for real?

Objectives
  • Get an idea of a full workflow with argo

So far we’ve run smaller examples, but now we have everything at hands to run physics analysis jobs in parallel.

Download the workflow with the following command:

curl -OL https://raw.githubusercontent.com/cms-opendata-workshop/workshop-payload-kubernetes/master/higgs-tau-tau-workflow.yaml

The file will look something like this:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: parallelism-nested-
spec:
  arguments:
    parameters:
    - name: files-list
      value: |
        [
          {"file": "GluGluToHToTauTau", "x-section": "19.6", "process": "ggH"},
          {"file": "VBF_HToTauTau", "x-section": "1.55", "process": "qqH"},
          {"file": "DYJetsToLL", "x-section": "3503.7", "process": "ZTT"},
          {"file": "TTbar", "x-section": "225.2", "process": "TT"},
          {"file": "W1JetsToLNu", "x-section": "6381.2", "process": "W1J"},
          {"file": "W2JetsToLNu", "x-section": "2039.8", "process": "W2J"},
          {"file": "W3JetsToLNu", "x-section": "612.5", "process": "W3J"},
          {"file": "Run2012B_TauPlusX", "x-section": "1.0", "process": "dataRunB"},
          {"file": "Run2012C_TauPlusX", "x-section": "1.0", "process": "dataRunC"}
        ]
    - name: histogram-list
      value: |
        [
          {"file": "GluGluToHToTauTau", "x-section": "19.6", "process": "ggH"},
          {"file": "VBF_HToTauTau", "x-section": "1.55", "process": "qqH"},
          {"file": "DYJetsToLL", "x-section": "3503.7", "process": "ZTT"},
          {"file": "DYJetsToLL", "x-section": "3503.7", "process": "ZLL"},
          {"file": "TTbar", "x-section": "225.2", "process": "TT"},
          {"file": "W1JetsToLNu", "x-section": "6381.2", "process": "W1J"},
          {"file": "W2JetsToLNu", "x-section": "2039.8", "process": "W2J"},
          {"file": "W3JetsToLNu", "x-section": "612.5", "process": "W3J"},
          {"file": "Run2012B_TauPlusX", "x-section": "1.0", "process": "dataRunB"},
          {"file": "Run2012C_TauPlusX", "x-section": "1.0", "process": "dataRunC"}
        ]
  entrypoint: parallel-worker
  volumes:
  - name: task-pv-storage
    persistentVolumeClaim:
      claimName: nfs-<NUMBER>
  templates:
  - name: parallel-worker
    inputs:
      parameters:
      - name: files-list
      - name: histogram-list
    dag:
      tasks:
      - name: skim-step
        template: skim-template
        arguments:
          parameters:
          - name: file
            value: ""
          - name: x-section
            value: ""
        withParam: ""
      - name: histogram-step
        dependencies: [skim-step]
        template: histogram-template
        arguments:
          parameters:
          - name: file
            value: ""
          - name: process
            value: ""
        withParam: ""

      - name: merge-step
        dependencies: [histogram-step]
        template: merge-template

      - name: plot-step
        dependencies: [merge-step]
        template: plot-template

      - name: fit-step
        dependencies: [merge-step]
        template: fit-template

  - name: skim-template
    inputs:
      parameters:
      - name: file
      - name: x-section
    script:
      image: gcr.io/cern-cms/root-conda-002:higgstautau
      command: [sh]
      source: |
        LUMI=11467.0 # Integrated luminosity of the unscaled dataset
        SCALE=0.1 # Same fraction as used to down-size the analysis
        mkdir -p $HOME/skimm
        ./skim root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/.root $HOME/skimm/-skimmed.root  $LUMI $SCALE
        ls -l $HOME/skimm
        cp $HOME/skimm/* /mnt/vol
      volumeMounts:
      - name: task-pv-storage
        mountPath: /mnt/vol
      resources:
        limits:
          memory: 2Gi
        requests:
          memory: 1.7Gi
          cpu: 750m

  - name: histogram-template
    inputs:
      parameters:
      - name: file
      - name: process
    script:
      image: gcr.io/cern-cms/root-conda-002:higgstautau
      command: [sh]
      source: |
        mkdir -p $HOME/histogram
        python histograms.py /mnt/vol/-skimmed.root  $HOME/histogram/-histogram-.root
        ls -l $HOME/histogram
        cp $HOME/histogram/* /mnt/vol
      volumeMounts:
      - name: task-pv-storage
        mountPath: /mnt/vol
      resources:
        limits:
          memory: 2Gi
        requests:
          memory: 1.7Gi
          cpu: 750m

  - name: merge-template
    script:
      image: gcr.io/cern-cms/root-conda-002:higgstautau
      command: [sh]
      source: |
        hadd -f /mnt/vol/histogram.root /mnt/vol/*-histogram-*.root
        ls -l /mnt/vol
      volumeMounts:
      - name: task-pv-storage
        mountPath: /mnt/vol
      resources:
        limits:
          memory: 2Gi
        requests:
          memory: 1.7Gi
          cpu: 750m

  - name: plot-template
    script:
      image: gcr.io/cern-cms/root-conda-002:higgstautau
      command: [sh]
      source: |
        SCALE=0.1
        python plot.py /mnt/vol/histogram.root /mnt/vol $SCALE
        ls -l /mnt/vol
      volumeMounts:
      - name: task-pv-storage
        mountPath: /mnt/vol

  - name: fit-template
    script:
      image: gcr.io/cern-cms/root-conda-002:higgstautau
      command: [sh]
      source: |
        python fit.py /mnt/vol/histogram.root /mnt/vol
        ls -l /mnt/vol
      volumeMounts:
      - name: task-pv-storage
        mountPath: /mnt/vol

Adjust the workflow as follows:

Then execute the workflow and keep your thumbs pressed:

argo submit -n argo --watch higgs-tau-tau-workflow.yaml

Good luck!

Key Points

  • Argo is a powerful tool for running parallel workflows