Fable¶
A simple automation platform, the overall system is fairly simple. A Narrative
is a
sequence of Step
objects. Each Step
has a job to perform. The Narrative
keeps
track of progress and facilitates inter-step data sharing via a State
object which is
dedicated to the running Narrative
classDiagram
class State {
Log logger
Datetime started_at
Datetime ended_at
}
class StepResult {
bool success
}
class Step {
str name
str description
StepResult run()
}
class Narrative {
Iterable[Step] steps
void run()
}
Step -- State: can access a
Narrative -- Step:consists of many
Narrative -- State:owns a
Step -- StepResult:results in a
Whilst there is 1 base Step
, the intent is for there to be many, easily configurable
Step
sub-classes:
classDiagram
class FileSystem {
str name
}
class FileDescriptor {
FileSystem sf
Iterable[str] paths()
}
class GlobFileDescriptor {
str glob
}
FileDescriptor <|-- GlobFileDescriptor
class Step {
str name
str description
}
class FileTransferStep {
FileDescriptor source
FileDescriptor destination
}
class RestoreSQLBackup {
FileDescriptor source
str database
}
class BonoboGraphStep {
Graph graph
}
Step <|-- FileTransferStep
Step <|-- RestoreSQLBackup
Step <|-- BonoboGraphStep
FileDescriptor --> FileSystem: references a
FileTransferStep --> FileDescriptor: has 2
RestoreSQLBackup --> FileDescriptor: reference a
This approach originally came from an experimental project called Ernie. Ernie was an
database backed ETL platform that allowed for ETL pipelines (Flow
) to be build and
configured declaratively. A given Flow
could also be assocated with a schedule.
Allowing it to be run periodically or they could be left as manually triggered. It seems
to me that the current usage of the FPS ETL is actually pretty similar. There are
periodic automated things, as well as pipelines that are run in a more ad-hoc and are
generally modified in some way prior to the run.
Some modifications have support built in, such as specifying as WHERE
clause. Others
require manually changing code and then remembering to set it back again later. Clearly
not ideal.
The declarative approach is an interesting one, but could be considered a little complex and it would be ideal if step configuration could be optional dynamic.
As it stood in Ernie, all steps had fields. These represented the configuration of that step. The complexity of the approach we have to take here is that a lot of things are dynamically defined - like file names for importing stuff are derived from the date (+/- a dat or two here and there).
Ernie would handle this by allowing a developer to create a new custom Step
. Which
was fine for Ernie as the pipelines it supported were fairly static. But given the
nature of the FPS use cases this may not work out.
I wonder if you could mark fields as "provided at runtime" or similar which the system could extract into list of things that must be provided for it to run. e.g.
fable config a_narrative
get_source_files.start_date date
another_step.split_by string
It could then be run manually by:
fable run --conf="get_source_files.start_date=2019-01-01"
--conf="another_step.split_by=created" a_narrative
or even:
fable run --conf-file=some.txt a_narrative
It would need some manner in which to extract a list of those fields along with some manner in which to inject values on the fly. And how would it deal with multiple steps with the same name. e.g. 2 transfer steps. One for ingress and one for egress?