Create an Argo Workflow

Overview

Teaching: 5 min
Exercises: 20 min

Questions

How can I visualize my workflows?

How do I deploy my Argo GUI?

Objectives

Prepare to deploy the fileserver that mounts the storage volume.

Submit your workflow and get the results.

Workflow Definition

In this section, we will explore the structure and components of an Argo Workflow. Workflows in Argo are defined using YAML syntax and consist of various tasks that can be executed sequentially, in parallel, or conditionally.

To define a workflow, create a YAML file (e.g., my-workflow.yaml) and define the following:

Metadata: Provide a name and optional labels for the workflow.
Spec: Define the workflow’s specification, including the list of tasks to be executed.

Here’s an example of a simple Argo Workflow definition, get it with:

wget https://cms-opendata-workshop.github.io/workshop2023-lesson-introcloud/files/argo/container-workflow.yaml

The container template will have the following content:

YAML File

Let’s run the workflow:

argo submit container-workflow.yaml -n argo

You can add the --watch flag to supervise the creation of the workflow in real time as so:
argo submit --watch container-workflow.yaml -n argo

Open the Argo Workflows UI. Then navigate to the workflow, you should see a single container running.

Exercise

Edit the workflow to make it echo “howdy world”.

Solution

Learn more about parameters in the Argo Workflows documentation:

Workflow concepts

DAG Template

A DAG template is a common type of _orchestration_template. Let’s look at a complete example:

wget https://cms-opendata-workshop.github.io/workshop2023-lesson-introcloud/files/argo/dag-workflow.yaml

That has the content:

YAML File

In this example, we have two templates:

The “main” template is our new DAG.
The “whalesay” template is the same template as in the container example.

The DAG has two tasks: “a” and “b”. Both run the “whalesay” template, but as “b” depends on “a”, it won’t start until “ a” has completed successfully.

Let’s run the workflow:

argo submit --watch dag-workflow.yaml -n argo

You should see something like:

STEP          TEMPLATE  PODNAME              DURATION  MESSAGE
 ✔ dag-shxn5  main                                       
 ├─✔ a        whalesay       dag-shxn5-289972251  6s          
 └─✔ b        whalesay       dag-shxn5-306749870  6s          

Did you see how b did not start until a had completed?

Open the Argo Server tab and navigate to the workflow, you should see two containers.

Exercise

Add a new task named “c” to the DAG. Make it depend on both “a” and “b”. Go to the UI and view your updated workflow graph.

Solution

STEP          TEMPLATE  PODNAME                        DURATION  MESSAGE
✔ dag-hl6lc  main                                                 
├─✔ a        whalesay  dag-hl6lc-whalesay-1306143144  10s         
├─✔ b        whalesay  dag-hl6lc-whalesay-1356476001  10s         
└─✔ c        whalesay  dag-hl6lc-whalesay-1339698382  9s

Learn more about parameters in the Argo Workflows documentation:

Workflow concepts - DAG

DAG walk-through

Input Parameters

Let’s have a look at an example:

wget https://cms-opendata-workshop.github.io/workshop2023-lesson-introcloud/files/argo/input-parameters-workflow.yaml

See the content:

Yaml file

This template declares that it has one input parameter named “message”. See how the workflow itself has arguments?

Run it:

argo submit --watch input-parameters-workflow.yaml -n argo

You should see:

STEP                       TEMPLATE  PODNAME                 DURATION  MESSAGE
 ✔ input-parameters-mvtcw  main      input-parameters-mvtcw  8s          

If a workflow has parameters, you can change the parameters using -p using the CLI:

argo submit --watch input-parameters-workflow.yaml -p message='Welcome to Argo!' -n argo

You should see:

STEP                       TEMPLATE  PODNAME                 DURATION  MESSAGE
 ✔ input-parameters-lwkdx  main      input-parameters-lwkdx  5s          

Let’s check the output in the logs:

argo logs @latest -n argo

You should see:

 ______________
< Welcome to Argo! >
 --------------
    \
     \
      \     
                    ##        .            
              ## ## ##       ==            
           ## ## ## ##      ===            
       /""""""""""""""""___/ ===        
  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~   
       \______ o          __/            
        \    \        __/             
          \____\______/   

Learn more about parameters in the Argo Workflows documentation:

Parameters overview

Workflow input parameters.

Input walk-through

Output Parameters

Output parameters can be from a few places, but typically the most versatile is from a file. In this example, the container creates a file with a message in it:

  - name: whalesay
    container:
      image: docker/whalesay
      command: [sh, -c]
      args: ["echo -n hello world > /tmp/hello_world.txt"]
    outputs:
      parameters:
      - name: hello-param		
        valueFrom:
          path: /tmp/hello_world.txt

In a DAG template and steps template, you can reference the output from one task, as the input to another task using a template tag:

      dag:
        tasks:
          - name: generate-parameter
            template: whalesay
          - name: consume-parameter
            template: print-message
            dependencies:
              - generate-parameter
            arguments:
              parameters:
                - name: message
                  value: ""

Get the complete workflow:

wget https://cms-opendata-workshop.github.io/workshop2023-lesson-introcloud/files/argo/parameters-workflow.yaml

Yaml File

Run it:

argo submit --watch parameters-workflow.yaml -n argo

You should see:

STEP                     TEMPLATE       PODNAME                      DURATION  MESSAGE
 ✔ parameters-vjvwg      main                                                    
 ├─✔ generate-parameter  whalesay       parameters-vjvwg-4019940555  43s         
 └─✔ consume-parameter   print-message  parameters-vjvwg-1497618270  8s          

Learn more about parameters in the Argo Workflows documentation:

Workflow output parameters.

Conclusion

Congratulations! You have completed the Argo Workflows tutorial, where you learned how to define and execute workflows using Argo. You explored workflow definitions, dag templates, input and output parameters, and monitoring. This will be important when processing files from the CMS Open Data Portal as similarily done with the DAG and Parameters examples in this lesson.

Argo Workflows offers a wide range of features and capabilities for managing complex workflows in Kubernetes. Continue to explore its documentation and experiment with more advanced workflow scenarios.

Happy workflow orchestrating with Argo!

Key Points

With a simple but a tight yaml structure, a full-blown analysis can be performed using the Argo tool on a K8s cluster.

previous episode

Cloud Pre-Exercise

next episode

Create an Argo Workflow

Overview

Workflow Definition

YAML File

Exercise

Solution

DAG Template

YAML File

Exercise

Solution

Input Parameters

Yaml file

Output Parameters

Yaml File

Conclusion

Key Points

previous episode

next episode