Create an Argo Workflow

Overview

Teaching: 5 min
Exercises: 20 min
Questions
  • How can I visualize my workflows?

  • How do I deploy my Argo GUI?

Objectives
  • Prepare to deploy the fileserver that mounts the storage volume.

  • Submit your workflow and get the results.

Workflow Definition

In this section, we will explore the structure and components of an Argo Workflow. Workflows in Argo are defined using YAML syntax and consist of various tasks that can be executed sequentially, in parallel, or conditionally.

To define a workflow, create a YAML file (e.g., my-workflow.yaml) and define the following:

Here’s an example of a simple Argo Workflow definition, get it with:

wget https://cms-opendata-workshop.github.io/workshop2023-lesson-introcloud/files/argo/container-workflow.yaml

The container template will have the following content:

YAML File

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: my-workflow
spec:
  entrypoint: my-task
  templates:
    - name: my-task
      container:
        image: my-image
        command: [echo, "Hello, Argo!"]

Let’s run the workflow:

argo submit container-workflow.yaml -n argo

You can add the --watch flag to supervise the creation of the workflow in real time as so:

argo submit --watch container-workflow.yaml -n argo

Open the Argo Workflows UI. Then navigate to the workflow, you should see a single container running.

Exercise

Edit the workflow to make it echo “howdy world”.

Solution

apiVersion: argoproj.io/v1alpha1
kind: Workflow                 
metadata:
  generateName: container-   
spec:
  entrypoint: main         
  templates:
  - name: main             
    container:
      image: docker/whalesay
      command: [cowsay]         
      args: ["howdy world"]

Learn more about parameters in the Argo Workflows documentation:

DAG Template

DAG template is a common type of _orchestration_template. Let’s look at a complete example:

wget https://cms-opendata-workshop.github.io/workshop2023-lesson-introcloud/files/argo/dag-workflow.yaml

That has the content:

YAML File

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: dag-
spec:
  entrypoint: main
  templates:
    - name: main
      dag:
        tasks:
          - name: a
            template: whalesay
          - name: b
            template: whalesay
            dependencies:
              - a
    - name: whalesay
      container:
        image: docker/whalesay
        command: [ cowsay ]
        args: [ "hello world" ]

In this example, we have two templates:

The DAG has two tasks: “a” and “b”. Both run the “whalesay” template, but as “b” depends on “a”, it won’t start until “ a” has completed successfully.

Let’s run the workflow:

argo submit --watch dag-workflow.yaml -n argo

You should see something like:

STEP          TEMPLATE  PODNAME              DURATION  MESSAGE
 ✔ dag-shxn5  main                                       
 ├─✔ a        whalesay       dag-shxn5-289972251  6s          
 └─✔ b        whalesay       dag-shxn5-306749870  6s          

Did you see how b did not start until a had completed?

Open the Argo Server tab and navigate to the workflow, you should see two containers.

Exercise

Add a new task named “c” to the DAG. Make it depend on both “a” and “b”. Go to the UI and view your updated workflow graph.

Solution

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: dag-
spec:
  entrypoint: main
  templates:
    - name: main
      dag:
        tasks:
          - name: a
            template: whalesay
          - name: b
            template: whalesay
            dependencies:
              - a
          - name: c
            template: whalesay
            dependencies:
              - a
              - b
    - name: whalesay
      container:
        image: docker/whalesay
        command: [ cowsay ]
        args: [ "hello world" ]

The expected output is:

STEP          TEMPLATE  PODNAME                        DURATION  MESSAGE
✔ dag-hl6lc  main                                                 
├─✔ a        whalesay  dag-hl6lc-whalesay-1306143144  10s         
├─✔ b        whalesay  dag-hl6lc-whalesay-1356476001  10s         
└─✔ c        whalesay  dag-hl6lc-whalesay-1339698382  9s       

And the workflow you should see in Argo GUI is: DAG-diagram1

Learn more about parameters in the Argo Workflows documentation:

Input Parameters

Let’s have a look at an example:

wget https://cms-opendata-workshop.github.io/workshop2023-lesson-introcloud/files/argo/input-parameters-workflow.yaml

See the content:

Yaml file

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: input-parameters-
spec:
  entrypoint: main
  arguments:
    parameters:
      - name: message
        value: hello world
  templates:
    - name: main
      inputs:
        parameters:
          - name: message
      container:
        image: docker/whalesay
        command: [ cowsay ]
        args: [ "" ]

This template declares that it has one input parameter named “message”. See how the workflow itself has arguments?

Run it:

argo submit --watch input-parameters-workflow.yaml -n argo

You should see:

STEP                       TEMPLATE  PODNAME                 DURATION  MESSAGE
 ✔ input-parameters-mvtcw  main      input-parameters-mvtcw  8s          

If a workflow has parameters, you can change the parameters using -p using the CLI:

argo submit --watch input-parameters-workflow.yaml -p message='Welcome to Argo!' -n argo

You should see:

STEP                       TEMPLATE  PODNAME                 DURATION  MESSAGE
 ✔ input-parameters-lwkdx  main      input-parameters-lwkdx  5s          

Let’s check the output in the logs:

argo logs @latest -n argo

You should see:

 ______________
< Welcome to Argo! >
 --------------
    \
     \
      \     
                    ##        .            
              ## ## ##       ==            
           ## ## ## ##      ===            
       /""""""""""""""""___/ ===        
  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~   
       \______ o          __/            
        \    \        __/             
          \____\______/   

Learn more about parameters in the Argo Workflows documentation:

Output Parameters

Output parameters can be from a few places, but typically the most versatile is from a file. In this example, the container creates a file with a message in it:

  - name: whalesay
    container:
      image: docker/whalesay
      command: [sh, -c]
      args: ["echo -n hello world > /tmp/hello_world.txt"]
    outputs:
      parameters:
      - name: hello-param		
        valueFrom:
          path: /tmp/hello_world.txt

In a DAG template and steps template, you can reference the output from one task, as the input to another task using a template tag:

      dag:
        tasks:
          - name: generate-parameter
            template: whalesay
          - name: consume-parameter
            template: print-message
            dependencies:
              - generate-parameter
            arguments:
              parameters:
                - name: message
                  value: ""

Get the complete workflow:

wget https://cms-opendata-workshop.github.io/workshop2023-lesson-introcloud/files/argo/parameters-workflow.yaml

Yaml File

# parameters-workflow.yaml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: parameters-
spec:
  entrypoint: main
  templates:
    - name: main
      dag:
        tasks:
          - name: generate-parameter
            template: whalesay
          - name: consume-parameter
            template: print-message
            dependencies:
              - generate-parameter
            arguments:
              parameters:
                - name: message
                  value: ""

    - name: whalesay
      container:
        image: docker/whalesay
        command: [ sh, -c ]
        args: [ "echo -n hello world > /tmp/hello_world.txt" ]
      outputs:
        parameters:
          - name: hello-param
            valueFrom:
              path: /tmp/hello_world.txt

    - name: print-message
      inputs:
        parameters:
          - name: message
      container:
        image: docker/whalesay
        command: [ cowsay ]
        args: [ "" ]

Run it:

argo submit --watch parameters-workflow.yaml -n argo

You should see:

STEP                     TEMPLATE       PODNAME                      DURATION  MESSAGE
  parameters-vjvwg      main                                                    
 ├─✔ generate-parameter  whalesay       parameters-vjvwg-4019940555  43s         
 └─✔ consume-parameter   print-message  parameters-vjvwg-1497618270  8s          

Learn more about parameters in the Argo Workflows documentation:

Conclusion

Congratulations! You have completed the Argo Workflows tutorial, where you learned how to define and execute workflows using Argo. You explored workflow definitions, dag templates, input and output parameters, and monitoring. This will be important when processing files from the CMS Open Data Portal as similarily done with the DAG and Parameters examples in this lesson.

Argo Workflows offers a wide range of features and capabilities for managing complex workflows in Kubernetes. Continue to explore its documentation and experiment with more advanced workflow scenarios.

Happy workflow orchestrating with Argo!

Key Points

  • With a simple but a tight yaml structure, a full-blown analysis can be performed using the Argo tool on a K8s cluster.