Is this another of these blog posts that starts with a hot take and then proceeds to very sensibly explain that users are generally chasing the wrong metric? You betcha!
But it's not the 2010s anymore, so I don't need to string you along for paragraphs on end before providing an actionable answer, so let's dive right into it, shall we?
The XY problem and the fallacy of the caching ratio
If you are in charge of a caching layer, be it a lone Varnish, a full-scale CDN like Varnish CDN, Ora Streaming or a security gate like Orca and Varnish Artifact Firewall, knowing how much that layer is caching only provides part of the story, and focusing solely on it will make you blind to a lot of the signal.
See, in the early days, the goal of caching served only one purpose: making sure you didn't kill your origin. And most tools were simple: either you got a hit and served from cache, or you passed the request to the backend.
Those times have come and gone though. Nowadays, Varnish, and others can and will do so much more:
- Apply WAF rules, like geoblocking, ratelimiting or token validation
- Content aggregation, for example like ESI, but also via third-party calls to prepare a response
- Retrying an origin request if the first endpoint fails
- Even responding to a load-balancer probe generally neither touches the cache nor the origin
- And more!
If you've read this old blog post of mine, this should feel familiar: by focusing solely on the hit ratio, you're ignoring how a cache fundamentally works.
The better metric
Of course, there are many ways to skin this cat (English has weird expressions sometimes...), so I'll just focus on my go-to answer: what really matters is the traffic reduction, from the client side to the backend side.
I'll share the Varnish-specifics in the next section, for now, we can look at the general case that you can apply to all platforms, with this simple offload formula:
That's it! If no requests go to the backend, you get a 100% offload. If every request goes to the backend, you get a 0% offload. Much like you would in the "simplistic" caching case.
But the signal doesn't stop here! For example, imagine a Varnish setup that uses ESI, with each response merging 6 different HTML pages. Even if you have a 50% hit rate:
This gets us a -200% offload. The "caching" layer is now amplifying the traffic, sending thrice as many requests to the backend as it is receiving. It's neither good nor bad, but if you are only checking the hit rate, you'll be clueless about it.
If you are not convinced, let's consider a cache with a hit ratio of 50%, but with a faulty primary backend, leading to the cache server to retry all backend requests once to get good content.
Our formula yields:
Even though technically your hit ratio is 50%, your offload rate is 0% because on average, you still go to the origin once per client request.
Obviously, you generally have other metrics, graphs and alerts to detect faulty backends, but let's be honest, the hit-ratio is usually the first and most looked-at metric. So, by reframing the question slightly, we can get a better response that addresses our actual concerns more accurately.
Doing it with Varnish
Let's apply this newly learned knowledge. AsVarnish famously exposes hundreds of metrics, which makes applying our formula quite easy.
OTel
If you are using varnish-otel (available for both Varnish OSS and Varnish Enterprise), just use:
And you can use the route attribute to filter your traffic.
Accounting
If you don't use varnish-otel, vmod-accounting will still provide you with per-namespace and per-key answers:
It's a popular, out-of-the-box, solution if you are using the Controller which automatically creates namespaces for you.
Simplest case
And if you are just starting out with Varnish or want the minimal approach that doesn't require any extra components, the global counters are still here for you using varnishstat.
If your Varnish nodes are serving only one kind of traffic or your requests don't need to be classified, this is the short and sweet way.
Conclusion
Once again I’ve fallen for “let me tell you why you don’t need that” engineering trope , but in this particular case, and because the caching/reverse-proxy world has evolved so much over the years, it felt like a worthy PSA. As mentioned above, no one metric is a silver bullet and you'll want data from all involved systems, but traffic offload is definite a better candidate that the hit rate and it's loaded history.
I tried to keep this post short, so it's of course missing a ton of details and tips, like measuring the traffic offload based on bytes, rather than on requests, but if you'd like to chat about this topic, you can find me and the rest of the team on the Varnish discord!
