Skip to main content

Reinventing a wheel (for renderfarm)

For the last couple of years i had to work closely on tools, scripts and submitters for render farm management software. When Houdini's TOPs (PDG) hit the market - i was escited! And already had some expectations for it, and i must say i was expecting certain things to be done differently. Though it was clear why certain things were done the way they were, i couldn't stop wondering in the back of my head, what if some core concepts of TOPs were done differently.
 
Finally, a month ago i've decided what the heck, let's give it a try! So i started to implement my vision of task management software from ground up, a prototype, a proof of concept.
The name the folder with that code got is Taskflow
 
So then Taskflow is yet another task/job management software, that processes and schedules tasks for execution on a set of local/remote/cloud workers - so in some sense it's a generalized renderfarm management software, just like TOPs (PDG)
The core idea behind the taskflow  is that it operates with, well, tasks, and tasks are processed by nodes. nodes are connected into a directed graph (not acyclic), and this is not a dependency graph, as there is no evaluations/propagations happening.
 
Tasks themselves travel from node to node, one can say - they flow from node to node - hence the name taskflow.... deeeep, i know...
So you see, unlike in TOPs (or PDG) - the tasks are dynamic, constantly moving in the graph, while in TOPs tasks are bound to a certain node and may only produce new tasks for nodes connected below. The obvious downside of that is that there is "no static task generation", like in TOPs, what makes latter much more user-friendly, and allows you to estimate the amount of work before actually starting the work. Also in taskflow there is no finished state for the graph - it's like in those marble machines - always running, just add some marbles into it, if you design your marble machine to eventually spit out marbles - their "processing" will finish, but if you loop the machine - those marbles will be going in circles until you stop paying for electricity.
The term Task here is pretty loose - it's just an entity with attributes, and some metadata, it's up to nodes to define what to do with those tasks.
Tasks can split and join back, tasks can spawn new tasks, and tasks can reach dead ends and become "finished"
 
Tasks, obviously, have attributes, tasks can be arranged in hierarchy.
Each node analyzes attributes of a task and may change them and/or create a payload attached to the task. That payload is scheduled for execution by workers.
 
Worker is just a process running on this or other machine. worker gets work to do and sends back result.

So generally speaking, the payload is just what to run and in what environment, optionally what files to transfer to worker and back and how to map them. 
 
Node then can potentially choose to analyze payload invocation results and update task's attributes further, or even create a new payload to be scheduled again.

Nodes can have their own volatile runtime state, but they are not "running" themselves. For example, a node may choose to remember ids of tasks it processed and skip them if those tasks somehow come back to the node again later. Or, for another example, a node may not let children tasks pass through until parent one comes through, or the other way around.
 
Overall Taskflow was "designed" (more like imagined) ambitiously: supporting local machines, cloud machines on drop-in-drop-out model, with close to zero manual configuration, one-button deployment... this will take quite some time to implement, and then to reimplement, but this time efficiently...
the original idea kinda looked like this:

Though i'm mostly doing it as a proof of concept and study for myself, i do think this can have a real world application, for a small group of freelancers, to automate distributed pipeline and use idle machines to render stuff, maybe even for small studios.

Lets get back to reality from this realm of imagination. Right now i'm deep in alpha stage development, slowly implementing at least the core concepts and all main elements, so no publicly available prototype yet, but this post marks the point where i've finally prototyped some working versions of every component in the system, so it can actually run real tasks.
This is an example of simplest mantra render. This UI is far from finished (and should be rewritten for browsers anyway), so i have to explain:
It starts with one single task on node#7
 
node#7
is actually a frame range splitter - it looks for an attribute named "frames", treat it as an array of frames, and splits current task into a number of tasks, so every one of them has only some set number of frames.
Original frame range of 0-29 is split into 0-9, 10-19, 20-29, and these 3 tasks come to node#6
 

node#6
is ifd generator. it looks on a task for attributes "hipname", "hipdriver" and "frames" and creates a payload for workers to pick up.
that payload is an ifd generation houdini script. after each frame worker communicates back to the scheduler and adds a new task to it - a mantra ifd render task.
 
node#8
is ifd renderer. It looks on a task for "ifdpath" attribute and creates a payload to invoke mantra process with that ifd file

In this example there are 2 workers active: you can see tasks, whose payload was picked by a worker marked with yellow. one worker is generating ifds, second is starting to render already produced ifd file
 
 
I will be posting development updates once in a while here.
the prototype will be available once it's somewhat user-friendly and feature-stable
if you want to support - please come to https://www.patreon.com/xapkohheh