June 6, 2024
8 min read time

Multi-Tier Setups and Technology Showing Its Age

This week, I must say, I’m a bit grumpy. I’ve once again run into a bad architecture pattern that has bitten our users plenty of times, and it’s definitely time to end it. I hope you are ready for some shade, because today we talk about multi-tier cache setups, Cache-control headers, and some cautionary notes on another caching tool.


Scale up! Scale out! Then what?

Here’s a simple caching setup:


When you’re caching content, there’s usually a time when you reach the limits of your setup. Maybe it’s the network, maybe it’s the CPU (which would be surprising with Varnish…), maybe it’s the disk size. Usually, the first step is to scale up and upgrade the current servers (or pick larger instances, if you are using a cloud provider). Bigger NICs, bigger CPUs, bigger disks. Let’s make this a reality, you now have beefier machines:


But what if it’s not enough? Servers are, after all, physical boxes, and there’s a limit to the amount of components you can cram into them, and the components have a power limit.

If you can’t make the servers bigger, you can certainly buy more of them to scale out. It can be a bit costly, but that’s a straightforward solution, especially in a cloud environment where you can simply autoscale with the traffic. And Varnish has some great tools to make sure adding more nodes doesn’t increase the pressure on your origin servers, namely:


And honestly, we can cover a large spectrum of cases with just scaling up and scaling out, but sometimes we need to go bigger, and we need to change our architecture.


It’s so good I’m tiering up

The problem to fix here is a big limitation of the scaling out: when you add a new machine to the mix, you add a WHOLE machine. And that whole machine, in the context of caching, is mainly defined by two properties: caching capacity and network capacity.

If you don’t see where I’m going with this, fret not, I have a useful example. Consider a service providing a large catalog of videos: they have terabytes of data and lots of users. You need cache capacity for the former, and network capacity for the latter. BUT! They don’t grow together, or at least not in any meaningful way: you might double your users without adding a single video to the catalog.

On the other hand, when you scale out, you are jointly growing both dimensions, and at a certain point, it can become pure waste. Fortunately, we have a simple, proven and battle-tested solution for this: the famed multi-tier setup. Let me show you how simple that looks:


Pretty straightforward, right? There are just a few points I want to make obvious:

  • The edge nodes are facing our users, so they don’t need a lot of caching power, but we want them to be fast, with a lot of network oomph. Worst case scenario, if they don’t have an object, they can fetch it from the storage tier.
  • Behind are the storage nodes, they’re the opposite: all about caching storage, but their speed is secondary. They won’t be asked for a particular object often, but we really want to have it, to avoid bothering the origin.
  • And very importantly: edges use sharding to select a storage node, scaling that tier out does add caching capacity

It’s clear, it’s robust, and the best part is: you can now scale out the tier that makes sense, depending on the situation. If you have more traffic, scale the edges, if you have more content, scale the storage. Of course, that setup isn’t for everybody, but for large setups pushing tens of gigabits per second (or more!), it becomes sensible very, very fast.


Who hurt you?

Ok, now that the context is clear, back to my trauma. Years ago, when I was a young developer with a head full of hair (don’t worry, I still have hair, I’m just old now), I worked for a very cool multimedia CDN company, and we would cache video with a two-tier setup, Varnish for the edge, Nginx for the storage (MSE wasn’t a thing back then).

I’ll skip some of the details, but one day we realized that we were overcaching and delivering stale content. While it’s pretty innocuous for VOD, it’s something that will disrupt your live stream in a very, very annoying manner, with customers complaining about buffering, disconnections, weird error messages, while you are testing manually and seeing nothing wrong with the content.

Of course, we checked Varnish, using its awesome logs, and of course, everything was fine. Varnish was caching for the exact same time every single time, in accordance with the cache-control headers…and that’s when it clicked. It shouldn’t have been the same TTL every time! Let me explain.


Letting your origin decide

If your HTTP origin is well-behaved and cooperative, it will tell you how long content should be cached, like so:

< HTTP/1.1 200 OK < Server: GoodBoyOrigin < Date: Fri, 10 May 2024 22:07:09 GMT < Content-Type: text/html; charset=utf-8 < Content-Length: 228 < Connection: keep-alive < cache-control: max-age=20

That last header is super nice, because it tells us how long it thinks we should cache. And both Varnish and Nginx will honor that header.

Note: the caching header story is a liiiiiittle bit more complicated than that, but covering the full RFC would take days, so I’m cutting some fat here.

Where was I? Ah, yes, so, Nginx will happily honor the header and store the response for 20 seconds, and during that time will deliver a copy of it. An exact copy of it. That’s the stinker here: if another Varnish node arrives 17 seconds after Nginx caches the object, it will still be told to cache the object for an extra 20 seconds, instead of 3 seconds.

Visually, we have this:

Screen Shot 2024-06-06 at 11.40.03 AM

But what we want is:

Screen Shot 2024-06-06 at 11.40.28 AM

The solution is fairly obvious. Let’s look at what the response looks like when cached from Varnish:

< HTTP/1.1 200 OK < Server: Varnish < Date: Fri, 10 May 2024 22:07:09 GMT < Content-Type: text/html; charset=utf-8 < Content-Length: 228 < Connection: keep-alive < X-Varnish: 32774 < cache-control: max-age=20 < Age: 9

Varnish actually tells you how long the object has been in cache. So that the downstream level can actually subtract this from the expected TTL, to, you know, do the right thing™.


Adding insult to injury

But the problems don’t stop here! Nginx is consistent in that it will also ignore the Age header provided by the upstream cache (if any, of course), so it will exacerbate the problem even more.

There are ways to work around this, of course, you can add some configuration that will explicitly read the Age header to reduce the TTL, or maybe even write some lua, or rely on the Expires header which is actually ignored if Cache-control:max-age exists, so you have to worry about this too. The rabbit hole goes deep, really really fast.

And don’t misunderstand me, I’m all in favor of highly configurable tools that you can tweak to your liking. I’ve notably written many many times here about how our own VCL is an amazing tool. However, I’m of the firm opinion that if you are going to use a tool for caching, it should at the very least nail the basics.

I get why people reach for the hammer they know though: it’s comfortable, and for their limited needs at the time, “it’ll probably be enough”. And that’s the moral of this story: be aware when your needs are not “limited” anymore and extend your toolbox as your goals expand.