Internal Platforms and Capabilities¶
Unlike as traditional product company our revenue is directly tied to billable hours. As such billable projects always take precedence over internal development (of products, capabilities, programmes and even people to some extent). Effort is expended and then abandoned time after time as experiments and PoC work invariably is done as fast as possible without sufficient documentation or consistency of approach. As such when time runs out, or the people involve move on - we often lose the ability to pick up and carry on when time permits as all the knowledge of the system was tacit and no longer available. So picking up and carrying on involves a significant learning curve.
Whilst this is clearly a human problem and the nature of agency work, I have been brewing on a technical approach that could help ameliorate some the perceived road-blocks. Or at the very least aid in reducing the waste that occurs when capabilities are developed during downtime or when billable work has temporarily ebbed.
By standardising on our deployment process and infrastructure we eliminate on the lose of deployment and process knowledge. Here I would have said we should just spin up a small Docker Swarm cluster as it is simple to do, easily documented and thus easily learnt with some support. However, with Docker Inc.'s recent sale of their enterprise offering - including the last paid dev on SwarmKit I can no longer off this opinion with any certainty. This makes me very sad.
Instead infrastructure wise the only remaining options are Kubernetes and/or Serverless. Serverless is appealing for a pay as you use point of view. It also encourages some very elegant over all application architectures. However, it is not (currently) universally suitable. For example anything long running or which requires persistent state. It also (currently) locks us in to the whims of a single provider.
Kubernetes on the other hand, I feel, is like a precariously balanced tower of Jenga blocks glued together with the odd Lego block. It does however have massive mind share at present and can handle a much broader range of usage patterns, including acting as the substrate for serverless type functions with [OpenFaaS][openfaas] or Knative. Kubernetes has been adopted by all the major hosting providers as a managed offer - taking a lot of the pain out of using it as well as providing some level of protection against vendor locking. It's not perfect, each provider has vendor specific parts which would make moving a little tricky - but still considerably easier than moving from one serverless offering to another.
So from an infrastructure (or perhaps better phrased as Compute) point of view a managed Kubernetes cluster makes the most sense. Deployment processes are then fairly easily unified into either just using kubectl and a bunch of yaml files or something like helm (and a bunch of yaml files). Even if we decided to support both direct docker image hosting along side serverless functions that is still only 2 mostly overlapping deployment processes.
Ok, so that cover compute and a fairly simplified approach to infrastructure. What about the system and application architecture itself? I think I can summerise this in one statement and a series of rules:
I honestly don't care care about your application stack/technology/language as long as it has these characteristics:
- It can be deployed using a Docker container (or as a Knative/OpenFaaS function)
- It provides a (RESTful) API (although having another API interface is not the end of the world)
- It uses the cluster provided authentication and authorisation service (JWT token with permissions baked in)
- Things I haven't thought through yet
A random aside into where the above came from¶
These thoughts initially started bubbling to the surface around the time the physical ask me about wall display with taken down. The need for such a thing has been clear for some time. We even had a hack day project around it.
From the physical wall, we move on to a AirTable. It captured the information but did not make it easy to access or consume.
Then there was AMABot which (in my opinion) emphasised the process of collecting points over the point of the collection process. To provide a mechanism for people to find expert support from other people.
What I realised with all the above, failed or stalled experiments is that these are all just little datasets. They do not actually require much in the way of smarts. They could be implemented as fairly simple (and small) services. These could be built fairly quickly and once built do not need much work to maintain. The interesting stuff comes out when you start building services or systems that use one more more underlying data sources and make projections and decisions based on the data in them. For example an AMA/User Skills service could the source of truth for the AMABot. The bot itself is responsible for providing a user friendly interface to the User skills service - a clean separation between presentation and business logic. The same service could also be used by another system that identifies skills gaps in individuals - helping them decide on learning targets and OKRs or to surface what skill walk out the door when an employee leaves - informing those tasked with recruitment what to be on the look out for.