Internal developer platforms and the cult of Kubernetes

There is much to be said for focusing on the developer experience and providing “paved roads” that make it easier for them to build, test, and deploy applications.

After all, running “cloud native” applications in production can be fraught with difficulty. You will need to solve problems such as elastic scaling, fault tolerance, rolling deployment, monitoring, and service discovery. That’s a lot of complexity for any single engineering team to digest. If you want to achieve any economies of scale, then it makes sense to solve these problems in a consistent and systematic way across the organisation.

Enter the “Internal Platform Team”…

This has led to the rise of internal developer platforms, which Evan Bottcher described as a “foundation of self-service APIs, tools, services, knowledge and support which are arranged as a compelling, internal production.” Their purpose is to solve some of these more generic problems for engineering teams, providing a path of least resistance to production that allows engineers to focus on building new features.

These resources are typicaly curated by a centralised “platform team”. In their book “Team Topologies” Skelton and Pais attempt to legitimise this team in terms of helping to enable a reliable flow of delivery. In this sense, a platform team is a service provider for engineering teams, maintaining an internal developer platform as an optional, consumer-focused “product”.

This all makes good sense, and we can all agree that engineering enablement is a “good thing”. The problem is that this often provides a fig leaf for some organisational anti-patterns, ultimately involving overly complex Kubernetes implementations.

Many platform teams exist to protect the investment that an organisation has already made in Kubernetes. They may regard engineering teams as “customers” and strive to mould their “product” to address customer concerns, but these platforms are rarely optional in practice. In most cases, they are the only game in town and teams are not given a realistic alternative.

Kubernetes creates its own gravity that makes it difficult to consider any other style of solution. Everything is perceived through the narrow prism of containerised services running in a centrally managed Kubernetes cluster. This “one-size-fits-all” solution can even become a substitute for software architecture and design as teams assume they have architecture covered so long as they are deploying containers to Kubernetes.

The centralised platform team can also become a bottleneck that every application must pass through on its way to production. This is unfortunate if it is based on a technology that is so complex that only a team of specialists can operate it. This kind of technology silo also risks becoming a repository for untold complexity that will create a drag on future development for years to come. Sam Neumann summed this up nicely when he suggested that:

Whenever you come across a team which is named after a specific tool or technology, you have a potential problem. The API Gateway Team. The Enterprise Service Bus Team. And yes, The Platform Team.

Do you really need that Kubernetes cluster?

If we’re serious about the developer experience, then there is nothing remotely developer friendly about Kubernetes. Your average team of engineering ninjas don’t stand a chance of being able to manage Kubernetes clusters in production. That’s not what it was designed for.

Kubernetes emerged from the bowels of Google to handle containerised applications on an industrial scale. It won the container orchestrator wars because the alternatives were either too lightweight (Docker Swarm), too complex (Mesos), or just hopelessly arcane (Service Fabric). Despite this, most organisations who use Kubernetes do not require anything remotely close to the scale it was designed for.

When you use Kubernetes, you are trading operationally flexibility for complexity. The initial abstractions of pods, nodes and deployments are easy enough to master, but you quickly get bogged down in runaway complexity. A bewildering ecosystem of solutions has built up around it and no two implementations of Kubernetes are alike. There’s no shortage of conference speakers wanting to talk about their “Kubernetes journey” with the haunted look of a soldier returning from the front with a bad case of PTSD.

Platform teams typically try to conceal this complexity by building a series of abstractions to make Kubernetes deployments more accessible to engineering teams. This means making a lot of implementation decisions on behalf of the teams, so that every Kubernetes implementation winds up becoming very opinionated. But it’s “optional” and “customer-driven”, so that’s OK, right?

These platform teams are, in effect, a centralised DevOps team. We used to think that was an anti-pattern, as we wanted teams to take ownership of the entire life cycle of their systems, including how they behave in production. Instead, we are centralising decisions about how applications will run in “platform teams” purely because the chosen technology is too insanely complex to expose to engineers.

Internal platform initiatives are a way of managing this complexity, but they don’t necessarily have much to do with engineering enablement. In this context, they’re more about doubling down on sunk cost and trying to make a go of Kubernetes.

We need better abstractions

We can only hope to truly empower engineers if we are giving them full control over their deployment environments. Kubernetes is not the only game in town and there are many other ways of architecting applications to run in the cloud. You can even achieve infrastructure-as-code, lifecycle management, automated scaling, and failover protection without Kubernetes.

Ultimately, I expect to see better abstractions emerge from cloud providers that make it unnecessary for most organisations to get their fingers burnt with Kubernetes. These will give teams the freedom to choose their own tools, design their own architectures, and take control of production environments without being beholden to a centralised “platform team”.