October 9, 2024
8 min read time

Handling the X-Forwarded-For Header

Handling the X-Forwarded-For Header
5:16

When you send an HTTP request to a server, it’s unlikely that you are talking to it directly. Rather, you are sending the request to a proxy that will choose a recipient and forward said request to it. That’s especially true for CDN and other caching tools, such as Varnish.

However, origin servers often like to know which IP address sent them the request. It could be they want to collect statistics, or they need to locate them in the world, or maybe there’s some regulation that forces them to log that original IP. That’s a bit of an issue though as HTTP reverse proxy will create their own connections to the origin, and therefore use their own IP address, meaning the origin can’t use the TCP data anymore to identify the original caller.

But of course, we have a solution to fix this problem! Well, we even have two, but the first and most prominent one is the X-Forwarded-For (XFF for short) header, and it’s the subject of today’s post!

Append to that header

So, how do we get the client’s IP all the way through to the origin? With HTTP headers of course! Each intermediate server will implement a very simple logic:

  1. If the X-Forwarded-For header doesn’t exist, create it and fill it with the IP address of the current client
  2. If the header already exists, append the current client’s IP to it
  3. Pass the header down the chain

Let’s visualize this:
Handling the X-Forwarded-For header Blog GraphThis way, we get not only the client’s IP, but also the addresses of all the involved HTTP proxies. It’s pretty neat, but as we’ll see later, there are a couple of problems with this approach. But first, we need to prevent some cargo cult programming.

Don’t touch that header

You might have seen this piece of VCL floating around the interwebz:

if (req.http.x-forwarded-for) { set req.http.X-Forwarded-For = req.http.X-Forwarded-For + ", " + client.ip; } else { set req.http.X-Forwarded-For = client.ip; }

Which has to be valid, correct, and proper. First, it’s on the Internet, so it’s obviously right, and second, this VCL logic matches the one I was describing in the section. So we’re good, right?

Well, no. Not because the logic is wrong in itself, but because by the time that VCL code is executed, Varnish has already done this automatically, so you are appending client.ip to the XFF header a second time!

To be fair, that code was used in the configurations of yore, we’re talking pre-4.0, when everybody was copy-pasting that code in their VCL. That was a while ago though, and that logic is now in core, so don’t worry about it.

Be wary of that header

Alright, XFF solves a real problem and Varnish handles it seamlessly, but not everything is great: we may not be able to trust that header fully!

Let’s take Varnish for example, and let’s say we want to extract the original requester’s IP. The VCL to do that is short and sweet:

# discard the first comma and everything that comes after it set req.http.original-ip = regsub(req.http.x-forwarded-for, ", .*", "")

But have we considered the possibility that the original client (or anybody in the proxy chain, really) might have…lied? Request headers are written by the client, so there’s nothing preventing them from cheating. Here’s how to do it with our beloved curl:

curl http://example.com/puppy.jpg -H "x-forwarded-for: 1.1.1.1"

And if everybody on the proxy chain trust the previous link, that fake “1.1.1.1” will reach your origin unchecked!

This is why, most of the time, the first link of the chain that we can trust will reset the XFF header. In VCL, we just do this:

set req.http.x-forwarded-for = client.ip;

Doing so is a double-edged sword: we discard potentially useful information, but we also discard potentially forged data, but in this day and age, it’s better to play it safe.

However, if we trust all the proxies after this one, then we get the accurate list of trusted intermediates, plus the IP that contacted the first proxy, which is still pretty good information.

Explore alternatives to that header

In practice, X-Forwarded-For is ubiquitous, and everybody uses it, but there’s another way, in some contexts.

The PROXY protocol, created by HAProxy, doesn’t rely on HTTP headers. Instead it uses the first few bytes of the TCP connection to convey the IP/ports pairs (one for the client, and one for the server).

The big benefit is that you don’t need to worry about XFF headers and extracting the right field and then converting that string into an IP. It’s also faster as you save on processing time.

# using the X-Forwarded-For header import std; sub vcl_recv { set req.http.original-ip = regsub(req.http.x-forwarded-for, ", .*", ""); if (std.ip(req.http.original-ip, "0.0.0") ~ your_acl) # with PROXY, no conversion, it’s native! sub vcl_recv { if (client.ip ~ your_acl)

It’s also a great way to do focused TLS termination: for example, Hitch is a very fast TLS terminator, but it doesn’t do HTTP, at all, which is how it can be so fast. Since it doesn’t do HTTP, it can’t do XFF, but it can use the PROXY protocol and tell Varnish who contacted it.

On the Varnish side, we just need to tell it to listen on whatever port we want, and to accept the PROXY protocol. We notably do it for the official docker image by default:

varnishd ... -a proxy=:8443,PROXY

All the major HTTP players support it, so have a look!

You now know all about that header (almost)

For once, this was a relatively short article, but it covered the biggest details about X-Forwarder-For and the important pitfalls to avoid.

One thing I didn’t mention though is that it’s being replaced by the more standard Forwarded header. It will suffer from the same issues mentioned above, but it is more expressive and if you can trust the information it carries, then it offers a greater level of detail, which is great for debugging. It deserves its own blog post though, so I’ll stop here to avoid spoiling too much.

New call-to-action