Fable¶

A simple automation platform, the overall system is fairly simple. A Narrative is a sequence of Step objects. Each Step has a job to perform. The Narrative keeps track of progress and facilitates inter-step data sharing via a State object which is dedicated to the running Narrative

classDiagram
    class State {
        Log logger
        Datetime started_at
        Datetime ended_at

    }

    class StepResult {
        bool success
    }

    class Step {
        str name
        str description
        StepResult run()
    }

    class Narrative {
        Iterable[Step] steps
        void run()
    }

    Step -- State: can access a
    Narrative -- Step:consists of many
    Narrative -- State:owns a
    Step -- StepResult:results in a

Whilst there is 1 base Step, the intent is for there to be many, easily configurable Step sub-classes:

classDiagram

    class FileSystem {
        str name
    }

    class FileDescriptor {
        FileSystem sf
        Iterable[str] paths()
    }

    class GlobFileDescriptor {
        str glob
    }

    FileDescriptor <|-- GlobFileDescriptor


    class Step {
        str name
        str description
    }

    class FileTransferStep {
    FileDescriptor source
    FileDescriptor destination
    }

    class RestoreSQLBackup {
    FileDescriptor source
    str database
    }

    class BonoboGraphStep {
        Graph graph
    }

    Step <|-- FileTransferStep
    Step <|-- RestoreSQLBackup
    Step <|-- BonoboGraphStep

    FileDescriptor --> FileSystem: references a
    FileTransferStep --> FileDescriptor: has 2
    RestoreSQLBackup --> FileDescriptor: reference a

This approach originally came from an experimental project called Ernie. Ernie was an database backed ETL platform that allowed for ETL pipelines (Flow) to be build and configured declaratively. A given Flow could also be assocated with a schedule. Allowing it to be run periodically or they could be left as manually triggered. It seems to me that the current usage of the FPS ETL is actually pretty similar. There are periodic automated things, as well as pipelines that are run in a more ad-hoc and are generally modified in some way prior to the run.

Some modifications have support built in, such as specifying as WHERE clause. Others require manually changing code and then remembering to set it back again later. Clearly not ideal.

The declarative approach is an interesting one, but could be considered a little complex and it would be ideal if step configuration could be optional dynamic.

As it stood in Ernie, all steps had fields. These represented the configuration of that step. The complexity of the approach we have to take here is that a lot of things are dynamically defined - like file names for importing stuff are derived from the date (+/- a dat or two here and there).

Ernie would handle this by allowing a developer to create a new custom Step. Which was fine for Ernie as the pipelines it supported were fairly static. But given the nature of the FPS use cases this may not work out.

I wonder if you could mark fields as "provided at runtime" or similar which the system could extract into list of things that must be provided for it to run. e.g.

fable config a_narrative
   get_source_files.start_date date
   another_step.split_by string

It could then be run manually by:

fable run --conf="get_source_files.start_date=2019-01-01" --conf="another_step.split_by=created" a_narrative

or even:

fable run --conf-file=some.txt a_narrative

It would need some manner in which to extract a list of those fields along with some manner in which to inject values on the fly. And how would it deal with multiple steps with the same name. e.g. 2 transfer steps. One for ingress and one for egress?