Infrastructure Workflow¶
Cloudmesh supports an infrastructure workflow where users can specify python functions and map their execution on cloud infrastructure. The workflow feature allows you to define Python functions on a workflow class, and specify when to execute them via the command line or in a YAML file. You can then visualize the result of your workflow execution.
An example is given next.
from cloudmesh.flow.WorkFlow import BaseWorkFlow
class MyFlow(BaseWorkFlow):
def a(self):
print("in a!")
time.sleep(5)
def b(self):
print("in b!")
time.sleep(10)
def c(self):
print("in c!")
time.sleep(10)
This allows you to define functions in your workflow file. Then you can write a specification for the sequence to execute your functions:
(a; b || c)
Where
a ; b
is executed sequentiallyb || c
is executed in parallel.
Finally, after execution the results are stored in MongoDB to be visualized or consumed by later functions in the series.
Database Objects¶
There are two collections related to workflow objects. The first is for the definition of a flow, and the second for the status of a flow that is in progress.
Definition Collection¶
When you define a workflow, a new collection is created for the definition. For a workflow named “test”, this collection exists at “test-flow”. This collection contains objects that look like the following:
{
"name" : "pytesta",
"dependencies" : [],
"workflow" : "test",
"cm" : {
"kind" : "flow",
"cloud" : "test",
"name" : "pytesta",
"collection" : "test-flow",
"created" : "2019-05-02 00:32:11.034870",
"modified" : "2019-05-02 00:32:11.034870"
},
"kind" : "flow",
"cloud" : "test",
"status" : "defined"
}
The salient features are name
, which is the name of the node, and
dependencies
which is an array of other node names this node depends
upon. All elements in a flow definition collection will have
status : "defined"
.
Running Flow Collection¶
When a flow is started with
cms flow run
A new collection is started with the suffix “-active” added at the end.
For example, if your flowname is test
and your nodes are defined in
test-flow
, then the active collection in MongoDB will be
test-flow-active
. Objects in this collection are similar to the
above, with two changes:
First, they have a
result
field attached, which holds the JSON value from the result of executing the node andThey have a richer
status
field, with the following values:pending
is the status when the flow startsrunning
is the status when a node is being executedfinished
is the status when the node has executederror
is the status when a node finished execution with a non-zero exit code
When interacting directly with the database, it is important to use the
values from the definition collection unless you are explicitly
interacting with a flow in progress. The running collection may not be
up-to-date and may contain incorrect information. For example the
dependencies
array in the definition collection reflects the overall
dependencies specified in the flow definition but in the running
collection the array is continually modified whenever other nodes finish
their work.
Javascript Interface (proposed)¶
We are looking for someone that would chose as its project to include a rendering of some DAG in javascript. The javascript library must be free to use. Nodes and edges must be able to be labeled.
A promising start for a Javascript library is
This project is only recommended for someone that knows javascript already.
You will do the rest of the project in python. It is important that the functions be specified in python and not just Javascript. The focus is not on specifying the DAG with a GUI, but to visualizing it at runtime with status updates
Here is another summary that we posted earlier and is probably better as it has a dict return
So what we want to do is something i have done previously somewhere with graphviz, but instead of using graphviz we use java script. W want to define tasks that depend on each other. The tasks are defined as python functions. The dependencies are specified via a simple graph string
def a (); print("a"); sleep(1) ; return {"status": "done", "color":"green", shape:"circle", label="a"}
def b (); print("b"); sleep(2); return{"status": "done", "color":"green", shape:"circle", label="b"}
def b (); print("c"); sleep(3); return{"status": "done", "color":"green", shape:"circle", label="c"}
w = workflow("a; b | c")
; = sequential
| = parallel
w.run()
While executing the javascript would change dynamically the state and color after a calculation is completed. The workflow should also be able to be specified in yaml
Here just one idea:
tasks:
task:
name: a
parameter:
x: "int"
y:: "int"
calculation: f(x,y)
entry:
color: green
label: a
value: x (this is a python variable local to the function
shape: circle
return:
color: green
label: a
value: x (this is a python variable local to the function
shape: circle
Naturally at one point f(x,y) will be cloud related such as starting a vm and executing a command in teh vm ….
Followup:
We added a value to the return. Values can be any object.
def a():
x = 10
return {"status": "done",
"color": "green",
"shape": "circle",
"label": "c",
"value": x}
REST¶
An OpenAPI specification for this is to be defined.