December 16, 2024
5 min read time

The Importance (and ease) of Sharding Your Cache

Recently, I’ve been talking about object storage and one topic we absolutely need to discuss is sharding because it’s a simple and elegant way to maximize your cache capacity. However, digging through the blog archives, I realized that we only have one article about it and since then, we have made things a lot simpler.

What’s sharding again?

I think most readers will already know this, but let’s explain sharding in just a few words (and images!).

In short, rather than having all our caches delivering any and all requests, we want them to specialize into delivering certain objects. This specialization means each cache needs to worry about a fewer number of objects, which in turns leads to a higher total cache size, lower eviction rate, and better individual hit ratio. But that also means two very important things:

  • The global cache capacity is now increased with each new caching node in the cluster
  • Each object only comes from a limited number of nodes, raising the cluster’s hit ratio as a whole

Without sharding, our set up looks like this:

Sharding blog - Set Up Image - 1

You can tell it’s not great: each cache node needs to fetch almost all the objects, meaning a lot of pressure on the origin. And adding more caches will only make the origin traffic worse. There has to be a better way, surely?

There is! With sharding, the setup looks like this:

Sharding blog - Sharding Image - 2

 

Way better! This way, the origin only gets only one request per object, and the caches only need cache a fourth of the data set.

Note: I’ll skip some details such as implementation details, redundancy and a few more for the sake of brevity, but we’ll explore those in a future post about udo very soon.

Sharding is caring

In the diagrams above, you’ll notice that I haven’t really explained the “Load-balancer” brick, but it’s doing a lot of the heavy lifting and has some pretty important requirements.

First and foremost, if you want to shard, you need content-awareness, i.e. the load-balancer needs to speak HTTP so it can identify and properly route individual requests to the appropriate node(s). As a result you need a Layer-7 Load-balancer. Something operating at the TCP level, or with DNS routing won’t do.

The issue with that requirement is that it brings a pretty massive performance requirement: there aren’t a lot of tools that can efficiently route HTTP traffic at scale, and remember: the output of the load-balancer layer needs to be at least the output of the full caching layer below it. If you are pushing hundreds of gigabits per seconds that could be an issue.

It could, but it doesn’t have to, thanks to the magic of the two-tier architecture. We’ve already discussed it in detail previously so I won’t insist too much on it, but I want to share this beautiful diagram with you:

Sharding blog - Two-tier Architecture Image - 3

As you can see, we get sharding, and a first layer of caching before we even have to route requests. As explained in the previous article, it makes scaling very easy and efficient, because you can decide to expand on cache size or on bandwidth independently.

It’s pretty much the de facto standard for large deployment with massive bandwidth needs. However, it only starts being economical when your infrastructure reaches a certain size. What about smaller setups then?

Self-routing to the rescue

If you read the old article I linked at the beginning, you already know how we deal with this: by collapsing the two layers. In short, everybody knows who’s in the cluster, and everybody mathematically agrees on who should handle which request. Our diagram is now much flatter:

Sharding blog - Self-routing Image - 4-1

(for the sake of readability, only the external traffic coming to one node is represented)

As you can see, if a node doesn’t recognize a request as its responsibility, it will just pass it to the right node. In practice, there is some caching, to avoid hotspots that could take down a single node, but it’s a detail here in the grand scheme of things.

The implementation: cluster.vcl

If all this sounds good to you, you should try it and see how easy that is. In practice, we leverage the power of the udo vmod and of the cluster.vcl, a configuration file that’s included in the varnish-plus package (part of Varnish Enterprise). The code is as boring as it can be:

import udo; include "cluster.vcl"; backend cache_A { .host = "192.168.1.1"; } backend cache_B { .host = "192.168.1.2"; } backend cache_C { .host = "192.168.1.3"; } sub vcl_init { cluster.add_backend(cache_A); cluster.add_backend(cache_B); cluster.add_backend(cache_C); } <the rest of your code goes here>

Shout out to my colleague Alve for his awesome work on making the code so sleek. Usually, self-routing means heavily tweaking your VCL and making sure you hash properly and don’t loop infinitely between nodes. Here, cluster.vcl takes care of all that for you!

Since the question often comes up, let me get ahead of it: the amount of delay added by hopping through an extra now is negligible. In practice you are going to lose a handful of milliseconds (remember, it’s Varnish talking to Varnish, it can only be fast), and that loss is dwarfed by the benefits of finding the right cache and preserving your origin from extra traffic.

Note: if you work with dynamic environments, cluster.vcl can also use a DNS name to discover its peer, so you don’t need to manually list the nodes.

Maximize your caching platform!

This was a very, very quick introduction to sharding, and how it benefits Varnish and caching platforms in general, but I hope it was a useful read. If you already knew all of the above, stay tuned! We’ll soon dive into the technical details of vmod-udo which is our primary load-balancing solution, and as mentioned, the heart of our self-routing solution too.