Put your URLs up, and keep'em where I can see'em: HTTP redirection

In the first part of this blog series, we saw how to rewrite URLs to seamlessly redirect people to where the content actually is. Today we are going to see how Varnish can help you do the same thing NOT seamlessly, using one of the built-in facilities of HTTP.

The HyperText Transfer Protocol provisioned three return codes to explain that the requested content is somewhere else. So, instead of replying with a 200 ("Ok") or a 404 ("Not found") a server can reply with:

  • 301: the content you are requesting has been moved permanently.
  • 302: the content you are requesting has been moved temporarily.
  • 307: the content you are requesting has been moved temporarily. Wait, what? Actually, 302 indicates that you should try the new location with a GET request, while 307 indicates that you should keep the same HTTP verb (keep using POST, for example).

So, it's really the equivalent of "Sorry Mario, your princess is in another castle", but with one major difference: the "Location" header tells where to look next. For example, here a 30X from google.com:

> GET / HTTP/1.1
> Host: google.com
> User-Agent: curl/7.51.0
> Accept: */*
>
< HTTP/1.1 302 Found
< Cache-Control: private
< Content-Type: text/html; charset=UTF-8
< Location: http://www.google.fr/?gfe_rd=cr&ei=86xWWJ-gA_PS8AfN96GAAQ
< Content-Length: 258
< Date: Sun, 18 Dec 2016 15:36:19 GMT

And with Varnish, it's super easy to produce such responses.

But, why?

If we already have a way to rewrite URLs seamlessly, this seems like a step backward since the user now needs two requests instead of one to get the desired content. However there are valid cases where you want this.

Notably, such cases include when you want users and applications to update the new resource location, for example because you are changing a sub-domain, or because you want to migrate an API. Plus, the new resource may be outside your sphere of influence, preventing you from serving it directly, making HTTP redirection a valid proposal.

I'll add one more example that is becoming more and more frequent: HTTPS: redirections allow you to gently redirect clients from HTTP to HTTPS.

Back to basics: return synthetic

So, let's see how we can push users in the right direction. As an example, let's re-create a common behavior: if the request targets a domain we don't know, we redirect to the main one. And like last time, we'll create a VTC to test things:

varnishtest "30X redirections"

server s1 {}

varnish v1 -vcl+backend {
    //VCL logic
} -start

client c1 {
	txreq -hdr "host: varnish"
	rxresp
	expect resp.status == 301
	expect resp.http.location == "https://www.varnish-software.com/"
	expect resp.reason == "Moved Permanently"
} -run

The logic to satisfy is pretty basic, but it's twofold. First, we have to check the need for redirection, this will be done as soon as we have the request, in vcl_recv. Second, we need to generate a synthetic response, in other words, the request will never reach the backends (s1 isn't even started) and we will produce a reply ourselves. In Varnish 3 it was called vcl_error, but it was renamed in Varnish 4 to vcl_synth since it does more than serving errors. The code looks like this:

sub vcl_recv {
    if (req.http.host != "www.varnish-software.com") {
		set req.http.location = "https://www.varnish-software.com/";
		return(synth(301));
	}
}
sub vcl_synth {
	if (resp.status == 301 || resp.status == 302) {
		set resp.http.location = req.http.location;
		return (deliver);
	}
}

Note: I placed the full file here to avoid repeating myself too much, but I encourage you to download it and to run varnishtest on it. Spoiler alert: it passes.

The code is pretty concise and should be readable even to VCL beginners, but there are at least a few points worth noting:

  • the Location header isn't about just the path, but rather about the whole URL, including the "http://".
  • the status message "Moved Permanently" is never specified in the VCL, yet it appears in the response. In the "synth(301)" call, we omitted the second argument and Varnish intelligently picked the default message corresponding to this code. You can replace the synth call with simply 'synth(301, "Moved Permanently")' and the test will still pass.
  • there's sadly a little string copy to be made between req.http.Location and resp.http.Location because in VCL we don't have access to resp.* yet. We could have transfered part of the logic from vcl_recv to vcl_synth, but if you try it, you'll notice the split is uneasy. On the other hand, we are talking of just one header here and it won't make a difference.

None shall pass (if unsecure)

One case where HTTP redirection is more or less mandatory is when you want your users to upgrade to HTTPS. You could display a static 404 page saying "Sorry, but no", but that wouldn't be super friendly. Instead, what we can do is systematically redirect any HTTP request to its HTTPS counterpart.

Let's consider a basic setup using Hitch to allow our server to handle both HTTP and HTTPS:

hitch-varnish.png

Let's assume varnishd is started with "-a :80 -a 127.0.0.1:8443,PROXY", the first pair telling Varnish to listen to HTTP on port 80 (all addresses) while the second tells it to listen to HTTP via PROXY protocol (kindly decrypted by Hitch) on port 8443 (only localhost).

The VCL is super simple; we just need to "rebuild" the URL and send it back to the user:

sub vcl_recv {
    # the PROXY protocol allows varnish to see
    # hitch's listening port (443) as server.ip
	if (std.port(server.ip) != 443) {
		set req.http.location = "https://" + req.http.host + req.url;
		return(synth(301));
	}
}

sub vcl_synth {
	if (resp.status == 301 || resp.status == 302) {
		set resp.http.location = req.http.location;
		return (deliver);
	}
}

Notice how only the vcl_recv part changed? That unsatisfying header copy actually turned out okay!

Keeping Varnish dumb

The astute reader will have seen the parallel of how we can map new URLs to old ones and provide the same feature as in the first part of the blog, using redirection. A quick VCL would look like:

sub vcl_recv {
    if (req.url == "^/cmsa/post/") {
    	set req.http.location = regsuball(req.url, "^/cmsa/post/(.*)", "http://example.com/content/articles/\1");;
		return(synth(301));
	} else if (req.url == "^/images/") {
        set req.http.location = regsuball(req.url, "^/images/(.*)", "http://example.com/image//\1");;
		return(synth(301));
	} else ...
}
sub vcl_synth {
	if (resp.status == 301 || resp.status == 302) {
		set resp.http.location = req.http.location;
		return (deliver);
	}
}

And it would work. However, that is a bit dumb because Varnish still has to know about the mapping AND the client needs an extra request. One could argue that it offers an opportunity to update their bookmarks, but still, that's a bit of a sad combination.

Of course, there's a better way! The first piece of the solution is to let the backend deal with redirections. Even though they don't represent a resource, redirects are still valid HTTP objects[1], and that's all that matters to Varnish, which will diligently cache them. With this, the smarts can be outside of our VCL, which is nice...

But we can do more! We can instruct Varnish to follow redirections and only cache the "true" resource:

sub vcl_backend_response {
	if (beresp.status == 301 && beresp.http.location ~ "^https?://[^/]+/") {
		set bereq.http.host = regsuball(beresp.http.location, "^https?://([^/]+)/.*", "\1");
		set bereq.url = regsuball(beresp.http.location, "^https?://([^/]+)", "");
		return (retry);
	}
}

Here, we have to do the opposite of the previous VCL: previously we built the Location header from path and host, and now we have to dismantle it to retrieve the other two. It takes some regex-fu, but that's expected.

Again, here's the vtc file so you can check it passes and use it as a base for your own implementation. One thing to keep in mind is that you have a limited amount of retries per backend request so you may have to increase it a bit if you have a crazy amount of redirection (max_retries is 4 by default and can be changed using varnishadm, or specified in the command line).

Where to, now?

Redirections can be tricky, but Varnish has all the tools to make it easy for you to decide how to handle, and make it as painless as possible.

The two articles in this "series" were inspired by various questions on IRC, the mailing list and stackoverflow and I wanted to provide a solid answer to most of them, which I hope I did. If that was not the case, let me know, I'd be happy to help!

We get a lot of questions and also addressed some of them in a recent webinar, "Top 1- Varnish Cache mistakes and how you can avoid them", which you can watch on-demand.

Watch the on-demand webinar

[1]the same way a symbolic link doesn't contain data, but is still a file.

Image (c) 2012 astroshots42 used and modified under Creative Commons license

Topics: varnishtest, URLs, rewriting URLs, regex

20/12/16 15:49 by Guillaume Quintard

All things Varnish related

The Varnish blog is where the our team writes about all things related to Varnish Cache and Varnish Software...or simply vents.

SUBSCRIBE TO OUR BLOG

Recent Posts

Posts by Topic

see all

Varnish Software Blog