Varnish is a caching server, and a great one at that, that much we already know. But what about the content you don't want to cache? For example, those shopping cart requests and other completely uncacheable API calls?
We can of course handle it, but we've got to be wary of the sirens of the cargo cult because you will often see something like this on the internet:
sub vcl_backend_response {
# check if the backend response header named
# "cache-control" contains the word "private"
if (beresp.http.cache-control ~ "private") {
# if so, don't cache by limiting the Time-To-Live
# period to 0 second
set beresp.ttl = 0s;
}
}
This is both pretty intuitive, and also very wrong. In this post, we'll explore why it's a bad idea, how to do better, and along the way, we'll try to shine some light on a couple of lesser known features of Varnish.
Early decisions in vcl_recvOur above snippet can basically be run in two different contexts:
- we are sure we don't want to cache
- we think we want to cache
The first point is also the easiest: just by looking at the request, it's clear we can't cache the response. It could be because the URL starts with an "/admin/" or because there are cookies:
# this is the first VCL routine ran,
# it happen right after we have received
# the request headers
sub vcl_recv {
if (req.url ~ "^/admin/" || req.http.cookie) {
return (pass);
}
}
Returning "pass
" here flags the request as uncacheable and while it will still go through the rest of the VCL, it won't stick around in the cache and the response will be destroyed as soon as it is delivered to the user. Because of this, setting the TTL like we did in the intro is useless, and Varnish just ignores it.
I thought yes, but in fact no
What about content that should be cacheable, like css or content in "/static/"? Obviously, we would like to cache this content, but only if the response isn't marked as private by the backend:
# by looking at the request, we think we can cache
sub vcl_recv {
if (req.url ~ "^/static/" || req.url ~ "\.css") {
return (hash);
}
}
# but we need to wait for the response to be sure
sub vcl_backend_response {
if (beresp.http.cache-control ~ "private") {
set beresp.ttl = 0s;
}
}
And this is where we create a problem for ourselves!
See, Varnish does something that is pretty clever: request coalescing, which basically translates to "for similar user requests, only send one backend request at a time". And it's automatically active when we return "hash
" in vcl_recv
. That is the rock.
In vcl_backend_response
now, if we don't like the response, we decide to set the TTL to 0, which equates to "this response can't be reused". This is the hard place.
And so Varnish is in the middle of these two orders:
- it can't send multiple requests to the origin
- it can't reuse the response.
Logically, it is forced to queue the backend requests and deliver the responses to the clients one after the other. The big problem of course happens when user requests arrive faster than the backend can sequentially handle them, forcing users to timeout and/or Varnish to keep a ton of connections open while doing nothing with them.
It is obviously not good, and we can do better.
The cached uncacheable object
To solve this, we are going to save a light version of the object in cache. The difference from a regular object is that if a request hits on it, instead of delivering it, it gets the permission to bypass request coalescing and act as a miss, hence the term "Hit-for-Miss". Basically, this is just Varnish maintaining a little database of content that is uncacheable, and the TTL again dictates how long it should keep the entry around.
To create such an object, we just need to set the uncacheable property of the beresp object:
sub vcl_backend_response {
if (beresp.http.cache-control ~ "private") {
set beresp.uncacheable = true;
set beresp.ttl = 1d;
}
}
Note that I personally tend to set large TTLs for those objects because the uncacheability of an object isn't likely to change soon (most of the time). And, even if it does change, Varnish is smart enough to replace our uncacheable object with a regular one as soon as possible. So there's really no downside here.
Also, and this is pretty important: for this to work, you must have a non-null TTL. If it's not the case, requests won't find the object in cache, and won't get the permission to bypass request coalescing, so you'll end up in the same spot as before.
Sometimes, you get lucky, with some help
Now, one important point: it is possible that your VCL is setting a null or negative TTL without you ever having any issues. If that's the case, lucky you! There are two probable explanations for that.
The first one is that you never noticed the request serialization. This generally only affects a subset of the requests, and at low volume, the origin can usually cope. If that's the case, that's great, but check your logs to see if some requests didn't take an inordinate amount of time.
The second reason is more interesting. You may have been saved by Varnish's built-in VCL! The topic deserves a post of its own (and it will get it!), but briefly, this is code that is appended to yours and executed if you don't return from the subroutine. It turns out that the built-in vcl_backend_response
looks like this:
sub vcl_backend_response {
if (bereq.uncacheable) {
return (deliver);
} else if (beresp.ttl <= 0s || ...) {
# Mark as "Hit-For-Miss" for the next 2 minutes
set beresp.ttl = 120s;
set beresp.uncacheable = true;
}
return (deliver);
}
As you can see, if you don't return (that's the important part), Varnish will automatically correct the null or negative behavior and do the "correct" thing. Of course, by that time it's too late to ask you for a reasonable TTL, so it just assumes two minutes is a sensible default.
If you want to know more about it, you can find the file here.
A note about errors
Let's conclude with where you probably should NOT use Hit-for-Miss objects: errors. It's very tempting to go and write something like:
sub vcl_backend_response {
if (beresp.status >= 500) {
set beresp.uncacheable = true;
set beresp.ttl = 1d;
}
}
Which translates into "if the backend replied with an error, stop shielding it for that request". In particular, if the backend returned an error because it was overloaded, it's not going to help to hammer it with more requests.
This is why I warmly recommend caching errors since if you just got an error now, chances are that it's still going to be there in the next few seconds. Some setups require more snappiness than others, so caching an error for 5 or 10 seconds may not be acceptable, but every tiny bit of TTL helps. As an example, you can go down from thousands of requests per second to just 10 with a TTL of 0.1 seconds:
sub vcl_backend_response {
if (beresp.status >= 500) {
set beresp.ttl = 0.1s;
}
}
And you can of course go lower if needed!
Language shortcuts and conclusion
The whole matter, as you may have come to understand, stems from the fact that we don't in fact "cache a request", but instead, we "cache the response to a request, and base our caching decision on said request".
This complicates things a tiny bit as explained here, but the logic behind it definitely makes sense and allows us to have our cake (request coalescing) and eat it too (handle uncacheable content without stalling).
This was a bit of a "The more you know" post, so I'm going to be thematically consistent and encourage you to watch our recent webinar on the top ten dos and don'ts of Varnish.