February 6, 2019
7 min read time

novcl: an alternative to VCL

vcl-bridge

If you've read a few of my blog posts, you probably already know I love the VCL (Varnish Configuration Language) idea, big time. Being able to change the processing logic via code opens a world of possibilities and makes pretty much all other tools feel constrained in their configurations. But...

But, well, VCL is code, and code is scary to a lot of users, and I can understand when you begin with Varnish and only have very limited configuration needs, VCL can feel complicated and some would prefer a simple, declarative language. The good news is that it's totally possible, let's see how we can help!

VCL is so good it can replace itself

If there's demand, there'll be supply and you can already find a lot of tools, modules and whatnots to convert a dumbed down configuration to VCL, examples include Ansible and Puppet, as well as a myriad of others. But...

But what if we could just use a VCL framework as an alternative to VCL? All we need is a handful of VMODs to allow our users to edit a simple ruleset that we'll process via an immutable VCL file. The idea of this post is to show off the solution, of course, but also to highlight a few neat tricks you may not know yet and that will surely benefit you in the future.

The helicopter view

Here's the plan: we are going to have three files:

  • /etc/varnish/default.vcl: our read-only VCL framework, it'll define a few behaviors that you can trigger and configure via the rules file.
  • /etc/varnish/backend: a simple file containing the domain name or IP of the backend.
  • /etc/varnish/rules: the meat of the configuration.

If we need to change something, we can edit one of the latter two and then just reload Varnish (for the new people: no service interruption here). As the two files are loaded by the VCL when it is itself loaded, this works perfectly without needing to pre-process a file or run a script before reloading.

Backend file

This is the dumb one; it will literally only contain the domain name or IP of our backend. Something like:

192.168.12.34

Or maybe

staging-pool.example.com

Not exciting, at all, I know. Don't worry, it gets better.

Rules file

In this fella, we'll put all kinds of information about what to do about an individual request, and it will look like this:

prefix /supersecret/ block    404
prefix /admin/       pass
regex  ^/static/(.*) redirect 301 static.example.com/\1
suffix .jpg          cache    5m 1h 2d
suffix .js           cache    1m

From the example, some stuff is self-explanatory, but maybe not everything, so let's dig in!

This is a ruleset for vmod_rewrite, and the structure is very simple:

  • The first element is a word describing how we'll match the second element (prefix, suffix, globbing, regex, etc.)
  • Right after, we have a string that we are going to try and match against the request URL.
  • The last mandatory element is the command. It'll specify what we are to do with the request. Here we can block, redirect, bypass the cache or cache.
  • The rest of the line is a list of quoted strings that will depend on the command; "redirect" needs a status code (301 or 302 generally) and a new destination while "cache" can have as many as three durations (for TTL, grace and keep) and "pass" has no extra arguments.

This is of course an open API, and you shouldn't feel limited by it; the implementation is super simple, so you can easily add your own commands if needed.

VCL file

This is the main course and it's available here, even though the file is only 60 lines of code or so. Internally, the file does very little:

  • during initialization:
    • load the string inside /etc/varnish/backend and gives it to the dynamic backend director
    • load the rule file (/etc/varnish/ruleset)
  • for each request:
    • use the rule set to find a line matching the request URL (.match())
    • set the "command" request header accordingly.
    • for each command, execute the relevant code: pass, cache with the correct TTL, return a synthetic response, etc.

Here's the code:

vcl 4.0;

import goto;
import rewrite;
import std;
import urlplus;

backend unused none;

sub vcl_init {
	new director = goto.dns_director(std.fileread("/tmp/backend"));
	new ruleset = rewrite.ruleset("/tmp/ruleset", type = any, field_separator = auto);
}

sub vcl_recv {
	set req.backend_hint = director.backend();

	unset req.http.location;

	if (ruleset.match(urlplus.url_as_string())) {
		set req.http.command = ruleset.rewrite(field = 2, mode = only_matching);
	} else {
		set req.http.command = "default";
	}

	if (req.http.command == "block") {
		return (synth(std.integer(ruleset.rewrite(field = 3, mode = only_matching), 404)));
	} else if (req.http.command == "pass") {
		return (pass);
	} else if (req.http.command == "redirect") {
		set req.http.location = ruleset.rewrite(field = 4);
		return (synth(std.integer(ruleset.rewrite(field = 3, mode = only_matching), 301)));
	} else if (req.http.command == "cache") {
		unset req.http.cookie;
	} else {
		set req.http.command = "default";
	}
}

sub vcl_backend_response {
	if (bereq.uncacheable) {
		return (deliver);
	} else if (bereq.http.command == "cache") {
		ruleset.match(urlplus.url_as_string());
		set beresp.ttl = std.duration(ruleset.rewrite(field = 3, mode = only_matching), 2m);
		set beresp.grace = std.duration(ruleset.rewrite(field = 4, mode = only_matching), 0s);
		set beresp.keep = std.duration(ruleset.rewrite(field = 5, mode = only_matching), 1h);
		unset beresp.http.set-cookie;	
	}
}

sub vcl_synth {
	if (req.http.location) {
		set resp.http.location = req.http.location;
		return (deliver);
	}
}
A few details are a bit odd (and explained in the next section), but overall, you should find that it follows the plan quite nicely with a pretty legible flow.

Notable details

Even though the VCL is fairly readable, there are a few small things I'd like to touch upon to really help you understand what's going on here.

To be clear: the next sections are a FAQ about the nitty-gritty of the implementation and none of it is essential to understand the broad strokes of the code, so if you easily get bored by code, jump ahead!

regextable? matchmap?

The VCL uses vmod_rewrite heavily, and to be fair, in a fairly unconventional manner. It's not so much used to rewrite strings as to just access the various fields of the relevant rule once we've found it.

These calls will look like this:

ruleset.rewrite(field = X, mode = only_matching)

field is obviously the field we are interested in (on the rule line), but the mode argument is what actually does the work. It defines what the rewritten string should look like, and it can have three values:

  • regsub: return the full string, replacing the first match
  • regsuball: return the full string, replacing all the matches
  • only_matching: only return the rewritten match

In our case, the fields have no back reference to a capturing group, so we just get the field back, allowing us to use the rule set as a a mapping table.

Type conversion

VCL is a pretty strict language, notably in the type conversion domain. vmod_rewrite only deals with strings, and also only returns strings. But in our VCL we need to set some response statuses (an integer) as well as the TTL/grace/keep triplet (durations), and Varnish won't automatically convert those for us.

That's understandable from a security and convenience standpoint: what are we supposed to do if we use "foobar" as integer? 0? -1? 42? Instead of tackling this hard problem, Varnish defers to VMODs, in this case vmod_std, which has two functions for us:

  • std.integer(STRING s, INT default): tries to convert s to a string, and returns default if it's not possible.
  • std.duration(STRING s, DURATION default): does the same thing for durations, unsurprisingly.

From the language perspective, this is safer, but even for us, it's easier as we can omit fields if they don't interest us. This is why these two lines from the ruleset example are valid:

suffix ".jpg"          "cache"    "5m" "1h" "2d"
suffix ".js"           "cache"    "1m"

In the second case, grace and keep will be respectively 0 second and 1 hour.

Fake one until you can make one

I may have said it already, but the VCL is a bit of a strict language, and it won't like having zero backend declared. This is the reason for fake_be here, it's pointing back to localhost, and is never used, but at least, Varnish won't complain.

backend fake_be { .host = "0"; }

"But, wait, what about director then?" I hear you wonder. This is a very valid question, with a simple answer: director is, to Varnish, just an abstract object, that may or may not be used to create a backend (with director.backend()). And Varnish wants a non-null backend the instant it enters vcl_recv, so we abide and give it fake_be, just long enough for us to get a dynamic backend from vmod_goto.

Logging

If you've read this post, this isn't going to be a surprise: it's very easy to log what command was used to treat a request:

varnishncsa -F '%{command}i %U %s'

And needless to say, you can include hit-miss.vcl from that previous post in our present template to get all the benefits from both:

varnishncsa -F%{VSL:RespHeader:x-cache[1]}x '%{command}i %U %s'

Drawing the line

I only implemented four commands, which may not seem like a lot, and it's not. For instance, I'm sure a lot of you would like to be able to handle cookies in a fine-grained manner, or maybe filter querystring parameters. So, why didn't I go further?

The answer is going to be a bit blunt, so pardon my (being) French: declarative configuration sucks when you want freedom, which is what "fine-grained handling" means, really. In a declarative world, the logic is already set in stone, and you have to follow the path traced for you.

So for the ruleset (or any declarative configuration file) to be able to do "X", "X" planned and written the way you want it so that you can exploit it the way you want it. And as you can imagine, different people want different things. And this is exactly why the VCL was created: to avoid having to cater to everyone individually.

The VCL presented here is tremendously useful, don't misinterpret my words, but it's also "only" a crutch to make simple configurations very easy to write. If you need more, nothing beats the real thing.

As an example of this, this declarative configuration language is implemented using VCL, not the other way around.

Until next time!