February 1, 2018
7 min read time

Configuration tales: Protecting a flaky API server

Varnish is a powerhouse, and a lot of the use cases you'll see about it are about delivering stellar performance, specially in CDN contexts. But today I'd like to share a more intimate setup that I came across recently, where Varnish shone not because of its perfomance but rather thanks to its flexibility. This should give you some ideas on how to use it more, notably back within the confines of your infrastructure.

What's the situation?

Every case is of course unique, but let's see if I can describe this case in a way that will ring a bell.

Imagine: a big company, with a lot of internal webservices providing metadata, filtering and formatting them. Of course, not all these services are necessarily new, and you could say that some are still running thanks to the "if it ain't utterly broken, don't poke it" philosophy.

As you may have guessed, this is the tale of such a webservice, one lost in the background, written in Java*, doing way more than it should, and which gets brought down every time traffic gets a tad too strong. To be fair, the code of the service was built to handle requests of around 100 items, but as business grew, that number climbed to tens of thousands.

And yet, the application sort of carries on well enough to not warrant a full rewrite, thanks to the brave sysadmin who restarts it whenever too many requests bring it down. Because the data here served isn't critical, well, you make do with it.

So, bell rung?

What can we do?

In that precise case, throwing more CPU or more RAM at the problem won't work as the application wasn't built with scalability in mind, so yes, as expected, Varnish is the answer here.

Let's have a look at the important data points of our problem:

  • the application crashes/chokes if there are too many requests.
  • the application is sloooooooooooooooow.
  • requests are made from a handful of services and often look similar.
  • actually, some webservices perform the same request, save for the order of the querystring parameters. It's really not surprising since the services are built by different developers on different stacks.
  • did I tell you that the responses delivered are unzipped XML files weighing sometimes a few megabytes?
  • last but not least: the data isn't critical and can be outdated, but we should really do our best to at least serve something.

Let's build a VCL, shall we?

Let's do the obvious thing, create a VCL that will cache objects for some time; let's say five minutes:

vcl 4.0; /* don't mind that, that's the vcl version, not the varnish version*/

backend api_limited {
	.host = "192.168.1.123";
}

/* called when we receive the backend response (beresp) */
sub vcl_backend_response {
	set beresp.ttl = 5m;
}

Peachy! We just set up a simple shield for our application. Note that we aren't doing much; it's even possible that we have a very low hit-ratio. But by simply installing Varnish in front of the server (the backend), we gain something very cool: connection pooling.

Normally, each service will connect to the backend before issuing a request then disconnecting. While not super costly, connections and disconnections have some overhead. The big Varnish bonus is that it will reuse backend connections if it can, so the backend is freed from that overhead. Hurray!

We also have low-hanging fruit waiting to be picked: gzip compression. Normally, that's automatic, Varnish requests the gzip'd content and trusts the backend: if the response is compressed, store it compressed, and if not, consider that it's not worth compressing. But the trouble is that our backend is a bit daft and it NEVER compresses even when it returns a highly compressible XML file! So let's force Varnish's hand:

vcl 4.0; /* don't mind that, that's the vcl version, not the varnish version*/

backend api_limited {
	.host = "192.168.1.123";
}

/* called when we receive the backend response (beresp) */
sub vcl_backend_response {
+	/* check if the content-type header contains "application/xml" */
+	if (beresp.http.content-type ~ "application/xml") {
+		set beresp.do_gzip = true;
+	}
	set beresp.ttl = 5m;
}

Ta-daaaaa! Varnish now compresses the responses before putting them in cache. And it will serve them compressed to services asking for it, and will uncompress on-the-fly for the others.

That's not good enough!

I agree, we can do better! We said that excepting the querystring parameters' order, the requests were often the same, so let's take advantage of this:

vcl 4.0; /* don't mind that, that's the vcl version, not the varnish version*/

import std;

backend api_limited {
	.host = "192.168.1.123";
}

+ /* called when we receive a request (req) */
+ sub vcl_recv {
+ 	set req.url = std.querysort(req.url);
+ }

/* called when we receive the backend response (beresp) */
sub vcl_backend_response {
	/* check if the content-type header contains "application/xml" */
	if (beresp.http.content-type ~ "application/xml") {
		set beresp.do_gzip = true;
	}
	set beresp.ttl = 5m;
}

There are only two lines of magic here, but they are worth it. The first one initializes a VMOD (a plugin) named "std", and the second one, as soon as we receive a request, uses the VMOD's method "querysort" that will SORT the QUERYstring (see how imaginative we are at naming things?). That's important because Varnish uses the URL, including the querystring, to identify an object, so we make sure that all requests pointing to the same object look the same, which helps our hit-ratio, avoiding duplicate objects in our cache.

Let's push our luck

There's one thing we haven't addressed yet: the backend is super slow to reply and we don't like that. We could count on the requests to be frequent enough to keep the hit-ratio high, but every time we got a miss (i.e., Varnish doesn't have the object in cache), we'd have to wait for the backend to refresh the object.

To alleviate that, we'll use one of the more important data points: the information can be outdated. Varnish, in addition to the TTL (the time an object is served from cache) has a period called "grace" that occurs right after the TTL has expired. If someone asks for an object during its grace period, the object in cache is delivered, just like for the TTL, BUT, in addition Varnish triggers an asynchronous fetch to the backend to refresh the object.

That's quite nice because that allows us to deliver an object quickly and hide the lengthy fetch from the user. Using it is similar to setting the TTL:

vcl 4.0; /* don't mind that, that's the vcl version, not the varnish version*/

import std;

backend api_limited {
	.host = "192.168.1.123";
}

/* called when we receive a request (req) */
sub vcl_recv {
	set req.url = std.querysort(req.url);
}

/* called when we receive the backend response (beresp) */
sub vcl_backend_response {
	/* check if the content-type header contains "application/xml" */
	if (beresp.http.content-type ~ "application/xml") {
		set beresp.do_gzip = true;
	}
	set beresp.ttl = 5m;
+	set beresp.grace = 6h;
}

So far, so good. Can we do more?

At the moment, we have a backend shield that:

  • pool connections
  • gzip responses
  • maximize hit-ratio by rewriting the request
  • asynchronously refresh the content to speed up the delivery process

It's already pretty nice, but we have one more trick up our sleeve for extra protection. Our backend is a bit feeble and doesn't like getting too many requests at the same time, something we can reflect in our backend definition:

vcl 4.0; /* don't mind that, that's the vcl version, not the varnish version*/

import std;

backend api_limited {
	.host = "192.168.1.123";
+	.max_connections = 10;
}

/* called when we receive a request (req) */
sub vcl_recv {
	set req.url = std.querysort(req.url);
}

/* called when we receive the backend response (beresp) */
sub vcl_backend_response {
	/* check if the content-type header contains "application/xml" */
	if (beresp.http.content-type ~ "application/xml") {
		set beresp.do_gzip = true;
	}
	set beresp.ttl = 5m;
	set beresp.grace = 6h;
}

It's not perfect, though, if someone asks for a never-seen-before object and Varnish goes to the backend. That may fail because there are already too many asynchronous requests due to graced objects. And remember, we'd really like to serve something, so what we'll do is discriminate  background (asynchronous) fetches again:

vcl 4.0; /* don't mind that, that's the vcl version, not the varnish version*/

import std;

+ backend api {
+ 	.host = "192.168.1.123";
+ }

backend api_limited {
	.host = "192.168.1.123";
	.max_connections = 10;
}

/* called when we receive a request (req) */
sub vcl_recv {
	set req.url = std.querysort(req.url);
}

+ /* called before sending the backend request (bereq) */
+ sub vcl_backend_fetch {
+ 	if (bereq.is_bgfetch) {
+ 		set bereq.backend = api_limited;
+ 	} else {
+ 		set bereq.backend = api;
+ 	}
+ }

/* called when we receive the backend response (beresp) */
sub vcl_backend_response {
	/* check if the content-type header contains "application/xml" */
	if (beresp.http.content-type ~ "application/xml") {
		set beresp.do_gzip = true;
	}
	set beresp.ttl = 5m;
	set beresp.grace = 6h;
}

Ultimately, both backends represent the same server, but one isn't limited in terms of connections.

The one that is limited, though, will see connection errors if the threshold is reached, but that's really not an issue because in that case, the old object is not replaced. We'll just have to wait for another client to try our luck asynchronously again.

Done?

Done! Of course, depending on your own case, you can push that configuration further, but even on this quite generic (but real) use case, we've had a look at:

  • response inspection
  • backend selection
  • vmod
  • ttl and grace
  • gzip control

All in all, not too bad for a quick fix! By the way, if you want the full VCL, it's available here!

* yes, this is a gratuitous stab a Java, but, really, that's plausible, right?

Ready for more help with Varnish? Check out Varnish documentation.

GO IN-DEPTH WITH VARNISH DOCUMENTATION

Image (c) 2012 Lennart Tange used under Creative Commons license.