Varnish Backends Moving in the Cloud

Space: the final frontier. These are the voyages of the Varnish Project. It's continuing mission: to explore strange new requirements, to seek out new patches and new features, to boldly go where no Varnish user has gone before.

Let's go to the cloud and find them dynamic backends.

Problem statement

Varnish is an HTTP caching reverse proxy, a piece of software traditionally found in your own infrastructure in front of web servers, also located in your own infrastructure. But it's been a long time now since the traditional infrastructure started its move to the cloud: a weatherly term for hosting.

The problem with Varnish is to keep track of its backends when they may move with the prevailing winds. And in the cloud, your backends may fly way further than where they'd go if they were tethered in the island in the sun of the old-school infrastructure.

Enough with the forecast metaphors, let's sail another ship instead. A big one actually, since its cargo is made of containers. And containers imply the ability to rapidly spin up new applications, surfing on the cloud's elastic properties.

Yes, today's backends may move fast, and Varnish won't follow them by itself.

Backend limitations

The main problem with backends is that they are resolved at compile-time, so when you load a VCL its backends will be hard-coded forever. I often see backends declared using IP addresses, although I'd rather see domain names, but there's a catch.

If you declare a backend using a domain name for the address, whatever was resolved will be kept indefinitely as explained above. So Varnish will not honor the TTL of the DNS records. Static backends are your ticket to kissing your cloudy cloud backends goodbye. Worse than that, if you rely on a domain name it can only resolve to at most one IPv4 and one IPv6 address. Even worser than that, we used to have a DNS director in Varnish 3 that would solve this problem for us, and Varnish 3 reached its end of life quite a while ago now. And Varnish Cache Plus 3 will soon be retired.

The DNS director

Before Varnish 4.0 we could have solved the cloud problem by using the DNS director. It worked (and I use past tense because Varnish 3 is dead) in a very unique way in the sense that it could honor TTL unlike plain backends. But there's a trick to it: backends were created upfront and then enabled/disabled depending on DNS lookups.

The syntax is rather unique:

director directorname dns {
    .list = {
        .host_header = "www.example.com";
        .port = "80";
        .connect_timeout = 0.4s;
        "192.168.15.0"/24;
        "192.168.16.128"/25;
    }
    .ttl = 5m;
    .suffix = "internal.example.net";
}

The list property contains the usual backend properties followed by an ACL, and the VCL compiler creates backends for every single IP address matching theACL. Then there is the TTL so in this case five minutes after a lookup, the next transaction using the director blocks and performs the lookup, and other workers trying to use the same director will also block until the end of the lookup.

It worked like a charm, but in some scenarios it is an unrealistic solution. Would you generate backends for all IP addresses in Amazon's us-east-1 if you expect between two and ten backends at all times?

The great escape

As I see it, Varnish 3 was the feature peak of the project. Starting with Varnish 4.0, features gradually moved away from the core and sometimes spun off a mandatory escape.

The concept of the mandatory escape is really simple: Varnish should give you the ability to bend the tool without being locked by the author's imagination. A multitude of entry points is available in Varnish and you can extend it using the shared memory log, the CLI, VCL or modules. My favorite being all of them at the same time.

So Varnish 4.0 removed directors from the core and gave birth to the built-in directors VMOD. You could find the usual suspects, except for the client director that was a special case of the hash, and... the DNS director.

The reason is simple: we removed directors from the core, provided an escape hatch to implement your own director (and people did!) but we then lost the ability to act during the VCL compilation phase. So no more creating backends upfront, no more DNS director and instead many users clinging to Varnish 3.

The ability to implement features in modules and expanding modules capabilities has other benefits. It keeps Varnish itself very lean, and allows more contributors to create value. Varnish 5 may have less features than Varnish 3, but with modules contributed by Varnish Software and other Varnish hackers we can definitely do a lot more with the latest release.

The ubiquitous solution

The thing is, there's a solution to this problem that would work from version 3.0 to 5.0 (and I'm sure with earlier versions too) but it may require tweaks to deal with VCL differences between versions (mainly Varnish 3 against the rest).

The idea is pretty simple: you can generate the backends outside of Varnish, populate a director and load the new VCL. Code generation isn't too hard to do so I made a quick example in Bourne Shell:

#!/bin/sh

set -e

backend_file="$(mktemp)"
subinit_file="$(mktemp)"

cat >"$subinit_file" <<EOF
sub vcl_init {
    new $1 = directors.round_robin();
EOF

dig -t A +noquestion +nocomments +nostats +nocmd "$2" |
awk '{print $NF}' |
sort |
while read ip
do
    be="$1_$(printf %s "$ip" | tr . _)"

	cat >>"$backend_file" <<-EOF
	backend $be {
	    .host = "$ip";
	}

	EOF

	cat >>"$subinit_file" <<-EOF
	    $1.add_backend($be);
	EOF
done

echo "}" >>"$subinit_file"

cat "$backend_file" "$subinit_file"
rm -f "$backend_file" "$subinit_file"

On my machine, it produced the following output:

./backendgen.sh amazon amazon.com >amazon.vcl
cat amazon.vcl
backend amazon_54_239_17_6 {
    .host = "54.239.17.6";
}

backend amazon_54_239_17_7 {
    .host = "54.239.17.7";
}

backend amazon_54_239_25_192 {
    .host = "54.239.25.192";
}

backend amazon_54_239_25_200 {
    .host = "54.239.25.200";
}

backend amazon_54_239_25_208 {
    .host = "54.239.25.208";
}

backend amazon_54_239_26_128 {
    .host = "54.239.26.128";
}

sub vcl_init {
    new amazon = directors.round_robin();
    amazon.add_backend(amazon_54_239_17_6);
    amazon.add_backend(amazon_54_239_17_7);
    amazon.add_backend(amazon_54_239_25_192);
    amazon.add_backend(amazon_54_239_25_200);
    amazon.add_backend(amazon_54_239_25_208);
    amazon.add_backend(amazon_54_239_26_128);
}

The VCL we load can then be as simple as this:

vcl 4.0;

import directors;

include "amazon.vcl";

sub vcl_recv {
    set req.backend_hint = amazon.backend();
}

Now all you have to do is to periodically reload the VCL, and there you get pseudo-dynamic backends. Obviously this is a very simplistic example, and you can get more information from dig than just a list of IP addresses. This approach works and it is suitable for production use, but if you go down that road you may burn yourself.

Watch the temperature

We solved the discovery problem by using the dig(1) command and routing is done by a director in Varnish. We can now work with our backends in the cloud and let them come and go in our elastic cluster, and the script will help Varnish keep track of everything. But the periodic VCL reloads come with a problem that we tried to solve with Varnish 4.1.

Reloading the active VCL is fairly convenient; it can be as easy as running system varnish reload and if you don't pay attention and discard older VCLs you can accumulate too many of them. The largest number of loaded VCLs I've seen was over 9000 (why actually over 16000) and it must have been a problem frequent enough because in Varnish 4.1 phk introduced VCL temperature.

The idea is that VCL needs to be warmed up before use, and by default will cool down after use. Cold VCLs are meant to have a lower footprint (read: not hamper the active VCL) and for that they drop their probes, counters and kindly invite VMODs to also release any resource that could later be acquired again.

The bottom line is that reloading VCL implies that you should bite the bullet and come up with an appropriate discard strategy to avoid letting loaded VCLs pile up and eat all your resources.

Label rouge

We've established that we can generate the backends declaration, for example using a script called periodically. You could instead react to events for cloudy stacks that can notify you of changes. But this kind of solution won't scale too well - yet another caveat.

What if you don't have one cluster of backends but several of them? They may need to be refreshed at varying paces, and rollbacks will become more complex because a VCL reload may suddenly relate to a change in the code, or a change in one of the backend cluster definitions.

With Varnish 5.0 came the introduction of VCL labels. They were initially just symbolic links you could use to give aliases to your VCLs. But there was a hidden agenda behind the labels and later on it became possible to jump from the active VCL to a label.

If you run Varnish in front of several unrelated domains, with the use of labels you can now have separate cache policies that don't risk leaking logic between each other (although there are other things than logic that can leak) and each label can have a life of its own and be reloaded independently.

The syntax is simple:

sub vcl_recv {
    if (host ~ "amazon.com") {
        return (vcl(amazon));
    }

    if (host ~ "acme.com") {
        return (vcl(acme));
    }
}

So now I can generate and refresh VCL independently for my amazon and acme clusters as long as I load them using their respective labels. Chances are that the web applications behind need different caching policy and in a multi-tenant Varnish installation it will make things a lot easier (spoiler alert: isolated caching policies would still share the same cache infrastructure).

Back to the topic at hand

So far, the title of this post has been misleading as we haven't touched on dynamic backends at all. So let's remedy that and simply say that it was introduced in Varnish 4.1, or at least made possible with this release.

The backend (VBE) and director (VDI) subsystems sustained heavy changes between Varnish 3.0 and 4.0, breaking everyone's VCL that made use of directors. Well, breaking most VCL even without directors. But while VCL remained stable between 4.0 and 4.1, there were once again many changes in those subsystems.

The backend (VBE) and director (VDI) subsystems sustained heavy changes between Varnish 4.0 and 4.1 although users don't see any signs of it in their VCL code. Dynamic backends obviously, but also custom backend transports and the impacts of VCL temperature.

Varnish Cache doesn't ship with built-in dynamic backends, but Varnish Plus features two VMODs addressing that need: a drop-in replacement of the DNS director like the one in Varnish 3, and on-demand backends with one called goto.

The two hard problems

I won't be talking much more about vmod-goto, but will focus on the DNS director equivalent. Conceptually, it's a DNS cache, populated after lookups and evicted based on a TTL. Although cache invalidation is one of the two hard problems in computer science, invalidation wasn't hard to solve considering it's a rewrite of the DNS director: do the same invalidation again. The only difficulty was dealing with Varnish internals and figuring out how to evolve them until at some point they stabilized while 4.1 was still under development.

No! The real hard problem for this module was to name it. There was already a DNS module so I had to come up with something else and I eventually named it named, after the N in DNS and so that it would read nicely in VCL:

new amazon = named.director();

It took the development cycle from 4.0 to 4.1, and two point releases to get to a usable state.

DNS director improved

The DNS director in Varnish 3 had an ACL-like notation used to create all the possible backends ahead of time. And such backends couldn't benefit from probe support. vmod-named however relies on dynamic backends that are created just in time, an actual ACL can be used as a white-list and probes work.

Another difference with the DNS director is that lookups used to happen during transactions once the TTL had expired, blocking all transactions bound to the director. With vmod-named lookups happen asynchronously, and never slow down requests.

Finally, an anecdotal novelty (I would have missed it if it weren't for Geoff, thanks!) of Varnish 4.1 greatly simplified the director's creation. VMOD can have optional named (this is going meta) parameters and with a constructor consisting of many parameters: being able to use names instead of parameters ordering is bliss:

new my_dir = named.director(
    port = "80",
    probe = my_probe,
    whitelist = my_acl,
    ttl = 5m);

Known limitations

The DNS support is very limited, and only A and AAAA records are partially supported. Using the system's resolver means losing a lot of information from the lookup results. You have to specify a TTL because only the IP addresses are part of the results. And two distinct backends will be created if a machine is bound to both IPv4 and IPv6 addresses.

Also DNS is a passive system (I'm not referring to passive DNS), it doesn't notify you of name changes. So it's your job to figure an appropriate TTL depending on your elastic expectations.

And of course the biggest limitation is that you need a name server, otherwise there's no point in using DNS.

If you've used the DNS director, vmod-named's limitations are nothing new.

Going further

DNS is one of them obscure protocols on the internet. It is highly documented and yet often overlooked, and it's not rare to find hard-coded IP addresses in configuration files (that includes Varnish).

However after searching a bit it turns out to be available in a couple places. Check your cloud provider's documentation, or your favorite container orchestrator. They will likely provide some kind of DNS support.

If you're stuck on Varnish 3 because of the DNS director, you can now upgrade to Varnish Plus 4.1 and replace it with vmod-named. For other dynamic backend needs, for instance the use of DNS SRV records, or a discovery system not based on DNS, you will find that vmod-goto meets (or will meet) your needs.

Contact us if you want to learn more about dynamic backends. More information about vmod-goto will follow soon. To learn more about Dynamic backends and other new features in Varnish Plus, please register for our upcoming webinar.