Task Manager design

Concepts

Tasks

Tasks represents simple action and are defined by :

  • A name.
  • A list of dependencies. These dependencies must be met before running the task.

Moreover, tasks must be able to communicate between them in order to have an efficient system. So, tasks take inputs and produce outputs.

Recipe

A recipe is something which represent a configuration of an execution composed by several tasks. It can define a list of tasks authorized to be executed, add rules for tasks execution order, etc...

Dependencies mode

Dependencies can be declared in two modes called “Direct Dependencies” and “Output dependencies”.

Direct Dependencies

Each task declares a list of tasks that must be executed before current task.

Example :

  • Task A depends on nothing.
  • Task B depends on Task A.

Execution order will be : Task A then Task B.

The question is : Where do we define these dependencies?

Two replies :

  • In the task itself; the problem is when we want to replace a task by another one, we must change other tasks which depends on it.
  • In the recipe file; the problem is when you will have multiples recipes, you will probably declare the same dependencies many times.

This mode is simple to implement but it poses problems.

Output dependencies

Tasks will produce some output and others tasks will use them as inputs, so a second dependencies mode can be imagine : “Output dependencies” mode. The principle is quite the same, but tasks will not declare dependencies on other tasks, they will declares dependencies on outputs.

Example :

  • Task A need nothing and produce “a” output.
  • Task B need “a” output.

It will result in a more flexible system, the “a” output can be produce by any other task.

Ideas

We have some ideas to make a powerfull task system.

Output typing

Tasks will produces output and other tasks will use them. The problem is how to define the dependencies and so how to identify outputs. If we give a sample id to output, we face the direct mode problems. The idea is then to identify output by they type. When we say type, it can be anything, not necessarily the python type. This typing can be applied to both inputs and outputs.

Example :

  • A copy task can take “file” as input and produce a “file” output.
  • An extract task can take a “file.archive” and produce a “dir” output.

As we saw in the previous example, types can be extended (“file” and “file.archive” in example, both are “file” outputs). What are the advantages to do this ? As “file.archive” is also typed as “file”, we can use the copy task to copy the archive without more code. “file.archive” is a sub-type of “file” type.

Theses types will be defined in the task itself.

Base-inputs

One problem with the “Output dependencies” mode is that all the tasks will surely need inputs to be executed, so it’s not possible to execute one task without input. This requires that, at the beginning, some datas are given to be able to execute at least one task.

Theses base-inputs will be defined in the recipe file.

Filter inputs

Some problems can occur with selection of outputs for inputs, so we must be able to filter which outputs use as inputs. The global idea is to change the type of inputs in a more restrictive manner.

Input filter

The problem with the previous idea is, how to select the right output if we have multiples output which match the inputs types of task or if we have multiples inputs with the same types (or sub-types). The idea is to filter the output to be more precise and match only one output. Filters are defined in the recipe.

Example :

Task A take one input typed “file”.

We have multiple outputs typed “file” (for example “file.archive” and “file.log”), we can say that Task A must be executed with “file.archive” instead of “file”.

Output tag

The problem is still present if we have multiple output with the same type. In order to solve the problem, we can add some tags to outputs and we can filter inputs by using tags.

Example :

Task A take one input typed “file.archive”.

We have multiple outputs typed “file.archive” but with different tags (“file.archive#main” and “file.archive#dependence”), we can say that Task A must be executed with “file.archive#main” instead of “file.archive”.

Sub-recipe

When we will create many recipes, we will surely define “parts” of recipes many times, the idea is to be able to create recipes and use them in other recipes as task.

Example :

  • We define a recipe composed by “Extract” Task and “Copy” Task, so it can represent and “Extract To” task.
  • We use the “Extract To” task in an another recipe.

Definition of sub-recipes needs to define 2 things :

  • Which inputs the sub-recipe take as base-inputs ?
  • Which outputs will be returned ? Indeed, tasks of sub-recipe will produce outputs which can be useless for other recipes, so we need to define exactly which outputs will be returned, it can be specific outputs (specific types) or all outputs of one or more tasks.

Project Versions

Table Of Contents

Previous topic

Goatlib API

This Page