The single most frequent question I get from people discovering Varnish is "Do you support request collapsing?". And I must admit, it always catches me off guard, because of course, it has been doing so for years, and it's a caching server, so that's a must-have, right? (Yes, yes it is.)
For this reason, in this post, we are going to review some of the little features we take for granted but that also make Varnish a great origin shield.
A content delivery network (CDN) is a set of geographically distributed servers that will cache requests directed at your server(s), that is, the origin(s). The benefit is that you can serve much higher volumes of traffic, all across the globe, even with a small origin and little bandwidth because the CDN will reduce the traffic to your server by a few orders of magnitude.
But sometimes, that's not enough, notably because a CDN is rarely perfect: different data centers rarely share their caches, for example, so each of them needs to fetch a copy from the origin. And if you are using multiple CDNs, each CDN needs to fetch its own copy of each request. In the end, the CDN(s) may still bring your origin down if the traffic reduction factor isn't good enough.
The solution is to add more caching! We can place an extra layer of caching right in front the origin to shield it from redundant requests so that even if you are using 5 CDNs with 100 data centers each, your origin will only see one request per object. And it turns out that Varnish is a pretty good candidate for this.
The big bad wolf of content distribution is something called the "thundering herd" effect triggered by a massive amount of requests arriving at the same time. Think about something like Black Friday, but with HTTP requests, the systems get flooded and just choke on the incoming traffic. It either slows down to a crawl or crashes entirely. In practice, that usually happens for big online events, such as the Olympic Games, football matches, or, well, Black Friday.
However, that kind of traffic is usually highly cacheable, and Varnish implements request coalescing (or collapsing) to save the day. The idea is dead simple: if 20 users request the same content at the same time, fetch the data only once from the origin. There are quite a few interesting technical details to it, but I won't bore you with that today, don't worry. The important point is that Varnish does all the hard work for you, transparently, so that the thundering herd doesn't trample your origin.
accept-encoding: gzip
)range: bytes -10
)HEAD
)accept-encoding: brotli, range: bytes 0-511
)
With this, Varnish has all the information it needs and can tweak the response object on-the-fly to please the client.
An important point is that the "accept-encoding" header isn't an order, it's a declaration of capability, and a server is free to ignore it if it doesn't make sense. So Varnish will ask for compressed data, and if it doesn't get it, it'll assume it's best to keep and transmit it uncompressed. That's the case for images for example.
And of course, these are just out-of-the-box defaults that can be modified if your origin needs special care. For example, if you origin doesn't know how to compress text files, Varnish can do for you, saving cache space.
And because of configuration composability, we can combine schemes and ask for an ACL match and for a valid JWT token, or for a shared secret in a head or a successful reverse-DNS lookup. The sky's the limit!
As you can imagine, this is far from a complete list of features, but we had to draw the line somewhere, and I believe it highlights two important facets of Varnish: