A Tale of Infrastructure and Products¶

In the beginning (aka circa. 2005)

When I first started as a professions (lol) software developer, infrastructure was simple. It didn't seem it at the time of course but there wasn't much to it for web applications. We had a server that someone else has created for us. We were given FTP (😱) access and deployment was nothing more than copying a bunch of PHP files across.

Sure, there was more to it than that but that is all I knew and all I needed to know. As time went on I was lucky enough to be able to dive deeper into the server setup at a manageable pace. I learnt not just how to deploy my PHP code - but how PHP itself was installed and maintained on the server. How Apache was configured, how to manually apply changes to the database schema and so on. I was able to slow gain more and more understanding of how the systems fitted together.

So when the time came to transition from PHP to Python - the infrastructure side of that change was less daunting. Python web projects have a few more moving pieces. All with pretty good reasoning but certainly more complex than the "just copy the files to the server" approach of PHP. I discovered Fabric - a tools that could be used to effectively script the deployment process to provide structure, repeatability and process. It could even be used to automate the setup of the server to some extent.

It was around this time that tools to automate the process of server configuration started appearing. Puppet, SaltStack, Ansible are all names that gained popularity at one time or another. These tools again focused on the configuration management of individual servers. Ensuring the right packages were installed and updated. That configuration of the services on those servers was consistent and repeatable. The company I was at chose to adopt SaltStack which had the knock on effect of making our approach to server configuration standardised - allowing us to leverage efforts made on one project across all projects. Others used Ansible - but the lack of a centralised configuration management server often led to a more copy-paste approach between projects and the stagnation of the configuration management on projects that were not under heavy development.

SaltStack was better in some ways because the shared setup meant changes or upgrade made for one project could also effect others - helping to drag them along.

When projects were self contained and ran on pet servers, it made a lot of sense that significant amounts of time were spent on tooling up those projects to be able to configure the server on which they ran. It brought consistency and the effort involved was justifiable because the servers were long term investments used for the sole purpose of that project and designed to stick around for the forseeable future, with the odd retirement and upgrade as a project progressed.

Then came IaaS - Infrastructure as a Service. At first many simply treated it as another virtual machine hosting option. Using manual clicks and configuration to spin up servers, combined with those same configuration management tools to manage the content of those servers. But it turns out that IaaS systems are really quite different. With an emphasis of treating server resources like cattle not pets. Resources needed to be treated as ephemeral. Something that does not gel so well with the configuration management system. A different approach was needed.

Enter Infrastructure as Code - IaC. Define the infrastructure itself using code or more commonly declaring the desired state of the infrastructure and letting the IaC system do the actual provisioning. If infrastructure needs to be considered emphemeral then it follows that you need something to ensure that it can always be maintained in the right shape or created fresh. These tools however work at a different scale - configuring individual servers is not a core capabilitity considered when they were designed.

One could even argue that the tooling for IaaS encourages the minimum unit of deployment to be a baked VM image. You don't, or at least shouldn't, actually jump onto a server and install and configure them. Instead you should be building those server images and then using the IaC tooling to deploy those.

If we now throw in the meteoric rise in popularity of containerisation and hyper-scalable clusters things go a little off the rails.

Complexity is King (or the avoidance of it anyway)

Managed services like EKS and ECS on AWS and similar offerings from others make creating and maintaining a cluster achievable for companies without huge IT departments. But it still comes with a lot of complexity. Not only do you need to define your cluster but everything that feeds into it, permissions, policies, container registries, DNS, Load balancers, storage, databases (because running stateful loads on a container orchestration system is still not a well supported option).

Thanks goodness for IaC I hear you say?! Well, quite - these things certainly make it possible to keep the complexity under control. The same way you kept your pet server complexity under control with Ansible or SaltStack. But I believe it also does more than that. It tends to perhaps encouragee people to blur the lines between infrastructure provisioning and project deployment - which I think is a mistake. It increases by quite a lot the amount of knowledge someone has to have in order to successfully understand deployment, and in general how things fit together in production.

Take a look at: python-docs-samples/polls.yaml for an example. This is the deployment configuration for running a very simplistic Django application on a Google Kubernetes cluster. Even with pushing aside the need to build a Docker image and publish it to a registry, to understand this yaml file you need to understand containers, vs deployments, vs services, vs loadbalancers and how all those different parts fit together. You know what - a lot of devs would just give up at this point and just copy-paste someone elses config, change a few names around and cross their fingers. It is certianly a long way from the copy a file to the server approach I was able to cut my teeth on.

From the separation of concerns rises simplicity

Despite this, I also think that IaC should be seen instead as an enabler, an opportunity to simplify things again. In the same way that using Docker images allows us to (and to some extent nudges us toward) fully disentangling build (docker build) from release (docker push) and ultimately deployment. IaC helps us make that separations more concrete from a different direction. IaC allows us to provision the place where things are released to - a registry, but it doesn't do the actual release. It also allows us to provision the substrate on which the deployed application runs.

However, we also try to use it to perform the deployment itself. To be fair, if your deployment process for the application is a full end-to-end deployment of not just the application code but also the spinning up a new instance of the entire infrastructure that goes with it, this is fine and is the approach taken by the immutable-infrastructure crowd. The Immutable Infrastructure approach makes a lot of sense. Your build step becomes something like using Packer from Hashicorp to make an AMI with your product code baked in. Release could be as simple as pushing that to an S3 bucket. Deployment is running your terraform/cloudformation to spin up an entirely new set of infrastructure where the packer created AMIs are use as the base for each server and then repointing an existing load balancer to the new servers. This, in my opinion, is how IaaS is probably supposed to be used - it comes back to the idea that in an IaaS world the unit of deployment is a VM image - not just the code. Servers are just cattle to be culled as needed.

As a vegetarian I do feel the pets vs cattle is a little lost on me, I would want to keep both around

The Immutable Infrastructure approach is fully aligned with treating infrastructure as cattle. That doesn't make sense when combined with Docker images instead of AMIs or other virtual machine images. They need an orchestration system. Right?

Well perhaps it actually does make sense - just at a different level of abstraction. Virtuals machines are after all virtual - they run on actual hardware at some point. All AWS is doing when you say you want an EC2 instance based on this AMI is allocate the resources required on physical hardware. I mean obviously there is a lot more smarts behind it than that but the principal still holds, the IaaS provider is acting as an orchestration system. So what you are actually doing when you take the approach of immutable infrastructure is declaring what your infrastructure should look like - and letting AWS, Azure or Google Cloud spin it up and keep it running. Is this really any different to asking a Docker Swarm, Kubernetes or even just a Docker daemon on a single server to provision the resources needed to run my application? It is also declarative and the actual execution is up to the orchestrator. A new deployment, it is the same thing - providing a new declaration of what we want now instead. Those original resources are not updated with new code and configuration - they are replaced. So we effectively have Immutable Infrastructure again.

But we need to be careful not to repeat the mistakes of the past and cling to tooling that made sense before and attempting to extend it to make sense at this new level of abstraction.

Ansible and it's ilk are a fine tools for the configuration of servers, for declaring what the inside of a server should look like. It is not a good tool for provisioning those server dynamically over time. For that you need CloudFormation or Terraform

CloudFormation and Terraform are fine tools for defining infrastructure, and the rules used to scale them. They are great at telling the IaaS orchestrator what you want and in what situations it should change. They are not good at defining what the workload on that infrastructure should look like - for that you have Hashicorp Packer and similar tools.

Or if you go in the container direction - CloudFormation and Terraform as great at defining the cluster, which is really just an abstraction on top of infrastructure anyway. But they are not so good at defining the containers that run on it. For that we have declarative formats and tools like docker-compose and the object config yaml files used by Kubernetes along with emerging tools like Helm.

Each layer of the stack has a difference cadence, and different tools have been designed to match the pace of those layers. When a tools written for one layer attempts to extend into the next, things get more complex and can actually end up holding you back. Preventing you from being able to fully embrace those abstractions or approaches.

Each layer still needs to be excercised lest is should fall into an unmaintainable state. So whilst I don't think you should be using IaC tooling to deploy application workloads - it should should still in a CI/CD type arrangement to ensure it is always kept in a working state. Code that isn't run festers. We need to trust the the IaaS providers are constantly exercising their layer, we should be doing the same for those we are responsible.

Ultimately I see infrastructure as effectively just another product seperate from the workloads that run on it, but there to support those workloads. In the same way those workloads are all just individual products designed to be separate but complimentary - supporting the business capabilities they enable.

So when we as an agency are developing something for a customer - we are invariablably developing at least 2 parallel products. The product to provide the capabilities the customer needs, and the product that supports that capability - the substrate it runs on - the infrastructure. I pretty sure there is some kind of separation of church and state metaphor here, but ever time I try to make it things just go in creepy directions. 🤷

fin - for now

p.s. Yes, I have completely ignored serverless I felt that the quagmire around what the term actually means is likely to muddy the water a little too much. But essentually I see things like AWS Lambda in the same way I see Kubernetes and ECS. It provides the substrate on which I run my workload. And I wouldn't use cloudformation or terraform to deploy lambda functions either. But they would have a place in getting all the supporting configuration in place. Also - I have next to no experience with serverless.