During the latest Varnish Summit in London, the awesome Thijs Feryn had a presentation in which he explained the need to normalize Accept headers, and I proudly spoke up to say that a few weeks prior, I actually wrote a VMOD to do exactly that.
Well...I DID write it, but never actually blogged about it, being the procrastinator that I am. You can consider this post as a cathartic mea culpa, and as an attempt to give a bit more exposure to a VMOD that should have been advertised/used weeks ago. But enough self-flagellation, and let's look at this VMOD, after we explain what Accept headers are. If you are already fluent in that subject, you can skip ahead to the "Balls to the walls" section.
Generation clash
Before we begin, I need to get something out of my system: I abhor the syntax chosen for the Accept headers family. There may have been a time when this made sense, but I'm probably too young to know that era, and to me, that old specification reads like a long chain of mistakes to avoid when designing a parsable format. <breathe in> <breathe out> Okay, that's done, and from this point on I will try and keep the ranting to a minimum and be factual (resolution may slip a few times, be warned).
Accept headers are part of the Content Section of RFC7231 (the Semantics and Content RFC of the HTTP 1.1 specification), and they enable the client to explain what kind of content it prefers to the server. This allows you to say that you prefer French over English, or JSON over XML (and who would blame you? French jSON is so likeable!). There are four Accept* headers:
- Accept: what mime type is to be preferred, such as "text/plain" or "audio/basic".
- Accept-Charset: what character encoding you want, such as "iso-8859-5".
- Accept-Encoding: what encoding compression scheme you'd like, examples include "deflate" and "gzip". Note that this is handled automatically by Varnish, so you should leave that one alone.
- Accept-Language: I'm pretty sure you already guessed it was about language, it expects codes like "fr" or "en-us".
And for each header, you can specify multiple choices by having a comma-separated list of options. But, to be fun, the choices don't have to be ordered by preference, nope, that would be dull! Instead, you can add properties to your options, using semi-colons, like so:
Accept-Language: fr;q=0.5;foo=bar, en-us;q=1.0
One important property (and the only one we care about here) is "q" which stands for "quality", and it can go from 0 to 1, 0 being "no thank you" and 1 "do want, GIMME!". So in the example above, even though "fr" is first, "en-us" is better for you. Hopefully, the server will reply with the requested page in English rather than French.
So, the user can convey their preferences, one issue is that the server may not care and just send the same object no matter what. And if it does, how do you know what headers are important?
This question is pretty important because the returned objects are different, so, when computing a hash key, you should hash the headers that uniquely identify an object. If you don't hash the relevant fields, you risk delivering the wrong variant to your clients, and if you hash irrelevant headers, you'll cache duplicates.
But how can you know?
Varnish kills the pain
Normally and thankfully, if a header impacts the resulting object, the server should tell you about it. For example, imagine this request, asking for json if possible, otherwise yaml:
GET /api/v1/accounts HTTP1/1
host: api.example.com
Accept: application/json;q=1, application/yaml;q=0.1
The server replies with json, and tells you that the Accept header impacted (or could have impacted) the response:
HTTP/1.1 200 Ok
Content-type: application/json
Vary: Accept
Content-length: 42
Is it important? Very much so! Because Varnish will read the Vary header and understand that Accept is meaningful, meaning that if a new request comes looking like this:
GET /api/v1/accounts HTTP1/1
host: api.example.com
Accept: application/json;q=0.1, application/yaml;q=1
Now the user would prefer yaml over json, and Varnish won't serve you the previously cached object because Accept is different. That is quite brilliant because you don't need to explain that in the VCL, you can just let the backend drive the cache's behavior.
Twist of fate
However, this nice idea has a pitfall that should be pretty obvious when looking at the next request:
GET /api/v1/accounts HTTP1/1
host: api.example.com
Accept: application/json;q=0.2, application/yaml;q=1
It basically means the same thing as the previous one ("I prefer yaml over json"); however, the two Accept headers are different, and because of this, Varnish will consider it a new object. If you have battled with them before, this is the same issue as with cookies and with URL parameters.
The only way to preserve both your cache and your sanity is to normalize them, that's what vmod-accept is all about.
Balls to the wall
Now that we know what we are against, let's dig in! The goal of vmod-accept is to boil a requester's header to one single word that we know will be accepted by the server.
The VMOD is centered around rule objects that you can configure, filling them with acceptable choices. The rules are generic and can be applied to any Accept-type header, it's up to you, the user, to not confuse them, but I trust you :-). The fact that each rule is bundled in an object allows us to easily define different rules for different backends, as they may have different capabilities. Here's a VCL example:
import accept;
sub vcl_init {
# create the Accept-Language rule, setting "en" as the default
new al_rule = accept.rule("en");
# fill it with acceptable languages
al_rule.add("fr");
al_rule.add("en-us");
al_rule.add("en-gb");
# do the same for Accept
new a_rule = accept.rule("text/plain");
a_rule.add("application/json");
a_rule.add("application/yaml");
}
sub vcl_recv {
# normalize headers
set req.http.accept = a_rule.filter(http.accept);
set req.http.accept-language = al_rule.filter(http.accept-language);
}
You can create a rule using accept.rule()
, and you can add and remove words to its list using the .add()
and .remove()
methods (how surprising is that?!). Usually, one would only add words, in vcl_init{}
. And I'm not sure anybody will ever remove choices, but hey, it was easy to implement, so here you are!
The core method is .filter()
that will parse a string and compare it with its internal list. It will only retain the highest ranked AND allowed choice, and if there's no union between the candidates and the authorized ones, the method returns the default string, given as an argument to accept.rule()
.
And, well, that's about it!
Run it if you can
At less than 300 lines of code, this is a pretty small VMOD, and yet, it can do wonders for your hit-ratio and for your VCL maintainability if you have backends making use of the Accept* headers. Hence I urge you to try it, and of course, to report any bug, feature request or contribution on its GitHub project!
Also, I plan to actually explain the concept of VMOD objects in a later post by using this VMOD as a reference, and it's very short and simple (expect for parsing the ugly syntax...). So, if you want to get a headstart: again, go have a look at the code on GitHub.
Find out more about the variety of VMODs you can take advantage of; visit the Varnish documentation.
Image (c) 2012 Steve Snodgrass used under Creative Commons license.