Progressive Delivery, Kubernetes, and 3D Chess

Karl Isenberg
4 min readNov 29, 2020

You’ve heard about progressive delivery, right?

No? Here’s a crash course:

Progressive delivery is a software deployment strategy to minimize risk and maximize uptime by deploying iteratively, to a gradually increasing number of clients.

Just as progressive delivery is an iterative deployment process, the concept itself is an iteration on continuous delivery, continuous deployment, rolling deployment, and canary deployment.

But instead of looking into the past, let’s look into the future!

The Past

Ok, I lied. We have to look at the past first.

As it turns out, you can do progressive delivery with a single Kubernetes cluster. It’s not particularly easy, unfortunately. The Deployment resource only supports rolling deployment, for now. But thankfully a Service can span multiple Deployments, because it refers to a group of Pods, using pod labels. So, what you can do is create two Deployments with the same Pod labels. That way, Deployment A and Deployment B are accessible as the same Service, with the ratio of clients matching the ratio of pods.

Maybe you can imagine what happens next.

  1. Start with Deployment A with N pods with labels App=X and Version=1
  2. Create Service X with selector App=X
  3. Create Deployment B with 1 pod with labels App=X and Version=2
  4. Create Service B with selector App=X and Version=2
  5. Validate Service B with smoke tests
  6. Validate Service X with metrics from user traffic
  7. Decrease Deployment A by 1 pod
  8. Increase Deployment B by 1 pod
  9. Validate Service X with metrics from user traffic
  10. Repeat steps 7–9 with N -1 and N+1 until Deployment A is at zero pods and Deployment B is at N pods
  11. Delete Deployment A
  12. Delete Service B

Yay! You made it!

The Present

But wait… there’s more…

Unfortunately, you can’t get by with just one Kubernetes cluster. Usually, you need at least one cluster per region for higher availability, or at least business continuity. So you build out three clusters, each in a different region, because you heard quorum was a good idea. So now how do you progressively deploy across three clusters?

Well, it looks a lot like the previous iteration:

  1. Start with Deployment A with N pods with label App=X and Version=1 on all 3 clusters
  2. Create MultiClusterService X referring to Pods with selector App=X on all 3 clusters
  3. Create MultiClusterIngress X referring to MultiClusterService X
  4. Create Deployment B on Cluster A with 1 pod with labels App=X and Version=2
  5. Create MultiClusterService B with selector App=X and Version=2
  6. Validate Service B (created by MultiClusterService B) with smoke tests
  7. Validate MultiClusterIngress X with metrics from user traffic
  8. Decrease Deployment A by 1 pod on the Cluster A
  9. Increase Deployment B by 1 pod on the Cluster A
  10. Validate MultiClusterIngress X with metrics from user traffic
  11. Repeat steps 7–9 with N -1 and N+1 until Deployment A is at zero pods and Deployment B is at N pods
  12. Delete Deployment A
  13. Repeat steps 4–12 on Clusters B & C until Deployment A is at zero pods on all clusters and Deployment B is at N pods on all clusters

Yay! You made it!

The Future

But wait… there’s more…

Unfortunately, one set of clusters isn’t enough either. Usually, you need at least two different environments of clusters: production and pre-production. And most people want three: development, staging, and production.

Are you tired of steps? I’m tired of steps.

Let’s just say you’re gonna need to iterate some more, across multiple environments. Deploy to dev, then staging, the prod. Test each one before you continue to the next. And if you’ve made it this far, make sure you automate decision making that gates progression and rollback. Without automation, you’re going to make a lot of mistakes manually fumbling through the tedious multi-step process. Make a mistake and you’ve just invalidated all the money, time, and effort you’re spending on multiple clusters, multiple regions, and multiple environments.

Yay! You made it!

When you gaze long into the abyss…

But wait… there’s more…

Unfortunately, one set of environments isn’t enough either. Usually, you need at least two different environment sets: highly available services and greedy resource hungry jobs. But you may also want to split your environment sets by domain or business units, for increased isolation.

As it turns out, maximizing efficiency and minimizing cost isn’t the be-all and end-all of prioritization. Usually, you also need high availability, reliability, and speed. And not all workloads prioritize these the same way. If you put greedy resource hungry jobs on the same clusters, node pools, or even networks as highly available services, you might find your services being disrupted by their greedy siblings. Any shared resource is a risk of disruption you may or may not be willing to tolerate for the benefit of reduced cost.

Thankfully, if you isolate these workloads to entirely different environment sets, it becomes less likely that they will disrupt each other. And also, because they’re different workloads, you don’t need to progressively deploy them across multiple environment sets.

But… your platform now spans multiple environment sets. And you might want to upgrade your platform…. progressively.

This ain’t your father’s 3D Star Trek chess. This is 4D Chess... at least.

And now it’s your move.

Good Luck!

--

--

Karl Isenberg

Cloud Guy. Anthos Solutions Architect at Google (opinions my own). X-Cruise, X-Mesosphere, & X-Pivotal.