Infrastructure Workflow

Cloudmesh supports an infrastructure workflow where users can specify python functions and map their execution on cloud infrastructure. The workflow feature allows you to define Python functions on a workflow class, and specify when to execute them via the command line or in a YAML file. You can then visualize the result of your workflow execution.

An example is given next.

from cloudmesh.flow.WorkFlow import BaseWorkFlow

class MyFlow(BaseWorkFlow):

    def a(self):
        print("in a!")
        time.sleep(5)

    def b(self):
        print("in b!")
        time.sleep(10)

    def c(self):
        print("in c!")
        time.sleep(10)

This allows you to define functions in your workflow file. Then you can write a specification for the sequence to execute your functions:

(a; b || c)

Where

  • a ; b is executed sequentially

  • b || c is executed in parallel.

Finally, after execution the results are stored in MongoDB to be visualized or consumed by later functions in the series.

Database Objects

There are two collections related to workflow objects. The first is for the definition of a flow, and the second for the status of a flow that is in progress.

Definition Collection

When you define a workflow, a new collection is created for the definition. For a workflow named “test”, this collection exists at “test-flow”. This collection contains objects that look like the following:

{
    "name" : "pytesta",
    "dependencies" : [],
    "workflow" : "test",
    "cm" : {
        "kind" : "flow",
        "cloud" : "test",
        "name" : "pytesta",
        "collection" : "test-flow",
        "created" : "2019-05-02 00:32:11.034870",
        "modified" : "2019-05-02 00:32:11.034870"
    },
    "kind" : "flow",
    "cloud" : "test",
    "status" : "defined"
}

The salient features are name, which is the name of the node, and dependencies which is an array of other node names this node depends upon. All elements in a flow definition collection will have status : "defined".

Running Flow Collection

When a flow is started with

cms flow run

A new collection is started with the suffix “-active” added at the end. For example, if your flowname is test and your nodes are defined in test-flow, then the active collection in MongoDB will be test-flow-active. Objects in this collection are similar to the above, with two changes:

  1. First, they have a result field attached, which holds the JSON value from the result of executing the node and

  2. They have a richer status field, with the following values:

    • pending is the status when the flow starts

    • running is the status when a node is being executed

    • finished is the status when the node has executed

    • error is the status when a node finished execution with a non-zero exit code

When interacting directly with the database, it is important to use the values from the definition collection unless you are explicitly interacting with a flow in progress. The running collection may not be up-to-date and may contain incorrect information. For example the dependencies array in the definition collection reflects the overall dependencies specified in the flow definition but in the running collection the array is continually modified whenever other nodes finish their work.

Javascript Interface (proposed)

We are looking for someone that would chose as its project to include a rendering of some DAG in javascript. The javascript library must be free to use. Nodes and edges must be able to be labeled.

A promising start for a Javascript library is

This project is only recommended for someone that knows javascript already.

You will do the rest of the project in python. It is important that the functions be specified in python and not just Javascript. The focus is not on specifying the DAG with a GUI, but to visualizing it at runtime with status updates

Here is another summary that we posted earlier and is probably better as it has a dict return

So what we want to do is something i have done previously somewhere with graphviz, but instead of using graphviz we use java script. W want to define tasks that depend on each other. The tasks are defined as python functions. The dependencies are specified via a simple graph string

def a (); print("a"); sleep(1) ; return {"status": "done", "color":"green", shape:"circle", label="a"}

def b (); print("b"); sleep(2); return{"status": "done", "color":"green", shape:"circle", label="b"}

def b (); print("c"); sleep(3); return{"status": "done", "color":"green", shape:"circle", label="c"}


w = workflow("a; b | c")

; = sequential

| = parallel


w.run()

While executing the javascript would change dynamically the state and color after a calculation is completed. The workflow should also be able to be specified in yaml

Here just one idea:

tasks:
    task:
      name: a
      parameter:
         x: "int"
         y:: "int"
      calculation: f(x,y)
      entry:
         color: green
         label: a
         value: x (this is a python variable local to the function
         shape: circle
      return:
         color: green
         label: a
         value: x (this is a python variable local to the function
         shape: circle

Naturally at one point f(x,y) will be cloud related such as starting a vm and executing a command in teh vm ….

Followup:

We added a value to the return. Values can be any object.

def a():
    x = 10
    return {"status": "done",
            "color": "green",
            "shape": "circle",
            "label": "c",
            "value": x}

REST

An OpenAPI specification for this is to be defined.