March 16, 2017
7 min read time

Yet another POST on caching vs cookies 1/2

There are "things" I quite don't like, and "yet another" was already one of those even before I first browsed the web. While looking for a punny catchy title, this is ironically what came to my mind. But the irony's not on me, it sadly is on the topic at hand. Also, it's one of those topics that regularly comes around so having a blog post to refer to might come handy. But you may ask, why yet another post on a solved problem? Well, why not?

Here be cookies

Because any superhero is someone's first superhero, I will assume that you may stumble upon this without prior knowledge of cookies. The goal is to give as much context as possible regarding what works and what doesn't with cookies in Varnish. And to do that I will summon my favorite explanation tool: an analogy with real life.

The main usage for cookies besides today's pervasive tracking is authentication and authorization. And so in my real-life example I will replace the web server with a building with offices inside. In this building meet Ralph the security guard. Ralph is responsible for authentication, so you need to show him your badge every time you go in, and he will check that the photo matches your face. After that you may use the same badge to open the barrier and enter the building. Ralph may remember you after weeks of credential checking, but he still has to check every day because he works for a security company and doesn't know anything about your company. You could no longer work there and come back with a stolen badge where the photo wouldn't pass Ralph's check even though it may have opened the barrier.

In HTTP, the cookie is the equivalent of the badge. The protocol is stateless: it means that all requests must carry all the necessary state. If a transaction on a web application requires more than one HTTP request, a client must send a cookie in each request to tell the server who they are for each step of the transaction. An almost canonical example is the e-commerce idea of filling up a cart and then making an order for the cart's contents. The server first gives cookies to the client, and the client is supposed to send the cookie for subsequent operations so that the server knows which cart it's working on. You cannot for example assume that two requests from the same connection belong to the same user (although HTTP/2 push promises make this assumption) because connections may come from a proxy like Varnish where connections are pooled, avoiding a resource waste proportional to the number of users.

Problem statement

The phrase "caching vs cookies" suggests a conflict between both, so there must be a big problem if we need yet another post on the topic. And there is actually more than one problem:

  • Varnish does it wrong
  • Web applications do it wrong
  • Cookies do it wrong

It might be surprising to read here that Varnish is wrong, and that one was of course on purpose. Varnish will by default prevent caching for requests containing cookies. This is wrong because caching has nothing to do with cookies, so it should work out of the box without further effort. But the reason why Varnish does this by default is to rather be safe than sorry. Most web frameworks, CMS, or hand-crafted applications do it wrong. Either they omit caching information, or when it is cacheable despite the use of a cookie, you may not know whether responses will vary depending on the cookies. Caching an e-commerce cart improperly may result in information disclosure, for example - a serious breach of privacy.

So Varnish turns out to be right (big surprise) by the magic of double negations. Doing it wrong to work around wrongness turns the table around... Because the risk of trusting a backend to the extent of cookies is too high, Varnish forces you to take action if you want caching with cookies. To touch on problems inherent with cookies, we need to get further into the HTTP lore but before that let's look at what Varnish does.

Built-in VCL

Varnish handles HTTP in mostly two places, in its core code and in VCL. The core code includes an HTTP engine that handles the lifecycle of HTTP transactions, and at some steps calls into VCL code where the caching policy is implemented. Regardless of your caching policy, there's a built-in VCL that is always appended to your VCL code. It contains rules that mostly comply to the HTTP specification with regards to caching. The built-in VCL deals with Cookie headers upon reception of a request:

sub vcl_recv {
  # [... Skipping unrelated rules ...]
  
  if (req.http.Authorization || req.http.Cookie) {
    # Not cacheable by default
    return (pass);
  }
  return (hash);
}

In plain English it says that we want to bypass the cache if a request contains headers called Authorization or Cookie. A classical response to this is to delete the cookie when we know that the response will be cacheable, which is paradoxical to handle on the request side. It can also lead to interesting side effects where Varnish, supposedly acting as a shield to the origin server, gives it more work than needed. Because web applications sometimes handle cookies wrongfully, removing a cookie from the request will trigger a cookie creation and for example you may resort to removing both Cookie and Set-Cookie headers for static resources:

# Seen on various occasions

sub vcl_recv {
  # Always cache the following file types for all users, Strip cookies for static files
  if (req.url ~ "(?i)\.(png|gif|jpeg|jpg|ico|swf|css|js|html|htm)(\?[a-z0-9]+)?$" ) {
    unset req.http.cookie;
    return (hash);
  }
}
    
sub vcl_backend_response {
  # Strip cookies for static files and set a long cache expiry time.
  if (bereq.url ~ "\.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|pdf|txt|js|flv|swf|html|htm)$") {
    unset beresp.http.set-cookie;
    set beresp.ttl = 24h;
  }
}

This example is bad for several reasons. First it relies on the URL to decide whether a resource is cacheable or not, instead of the response's Cache-Control header. It is pretty common to generate images dynamically, it's a no brainer for monitoring systems when it comes to showing graphs. Second, the regular expressions used are out of sync, the list of extensions for "static" resources aren't the same, and the second regular expression doesn't take the query-string into account (to be fair the first query-string matching is also broken). Third, it forces an arbitrary TTL instead of letting the origin server decide, and the backend is usually the best place to know better. Last but not least, doing an early return opens a window for cache poisoning since non-GET requests can be cached with the same key as GET requests.

Thou shalt not return

One interesting thing about the return statement is that it allows you to bypass the built-in rules, but in my opinion this is a wrong angle. A rule of thumb is to avoid returning at all, and let your VCL code flow through the built-in, combining your policy with the "oh by the way this is HTTP" built-in rules. You should use return not to avoid built-in rules, but to instead make decisions outside the scope of HTTP. Unless of course if you know for sure that the backend is wrong, telling you for example that something is cacheable (information leak anyone?) when you know for a fact it is not. You should endeavor to fix this in the backend and treat the VCL code as a hot-fix that should disappear as soon as possible.

Maybe the wording in the default.vcl file could be improved:

# This is an example VCL file for Varnish.
#
# It does not do anything by default, delegating control to the
# builtin VCL. The builtin VCL is called when there is no explicit
# return statement.

What we recommend is to compose with the built-in VCL. And in some cases bypassing the built-in VCL makes perfect sense: you have to return for purges. But how can we avoid bypassing the cache without stripping cookies or performing early returns? We must strive to circumvent the cookie check in the built-in vcl_recv, and only that check. With that in requirement mind it becomes simple:

sub vcl_recv {
  # save the cookies before the built-in vcl_recv
  set req.http.Cookie-Backup = req.http.Cookie;
  unset req.http.Cookie;
}

sub vcl_hash {
  if (req.http.Cookie-Backup) {
    # restore the cookies before the lookup if any
    set req.http.Cookie = req.http.Cookie-Backup;
    unset req.http.Cookie-Backup;
  }
}

Let's check it actually works with a test case:

varnishtest "Cache even with Cookie headers, no return"

server s1 {
  rxreq
  expect req.http.Cookie == "id=123"
  txresp
} -start

varnish v1 -vcl+backend {
  sub vcl_recv {
    # save the cookies before the built-in vcl_recv
    set req.http.Cookie-Backup = req.http.Cookie;
    unset req.http.Cookie;
  }

  sub vcl_hash {
    if (req.http.Cookie-Backup) {
      # restore the cookies before the lookup if any
      set req.http.Cookie = req.http.Cookie-Backup;
      unset req.http.Cookie-Backup;
    }
  }

  # add some debug info
  sub vcl_deliver {
    set resp.http.Obj-Hits = obj.hits;
    set resp.http.Seen-Cookie = req.http.Cookie;
  }
} -start

client c1 {
  txreq -hdr "Cookie: id=123"
  rxresp
  expect resp.status == 200
  expect resp.http.Obj-Hits == 0
  expect resp.http.Seen-Cookie == "id=123"

  # expect a cache hit despite a different cookie
  txreq -hdr "Cookie: id=abc"
  rxresp
  expect resp.status == 200
  expect resp.http.Obj-Hits == 1
  expect resp.http.Seen-Cookie == "id=abc"
} -run

So that's now a problem solved on the client side. We don't care about cookies sent by the client and we'll rely on the backend response to make caching decisions. And this is not too intrusive, we simply rename the Cookie header temporarily to avoid triggering the one test in the built-in VCL that bothers us and we still benefit from all other built-in rules. Keep in mind that leaving the opportunity for caching despite the presence of cookies requires an increased trust in the backend.

If you wish to remain sane, stop here. This is the best you can sanely do with regards to caching vs cookies. Otherwise be prepared to enter a realm of madness in part two!

Half-closing words

If you start with Varnish and run into the caching cookies wall, then this little stunt in vcl_recv and vcl_hash is enough to enable caching despite client cookies. More advanced users might want to bypass the bulk of Varnish's conservative defaults, like for instance the whole built-in VCL, default TTL or grace periods. If your Varnish setup is mature enough not to need the aforesaid conservative defaults, then your backends must speak a reliable HTTP.

I'm stressing that point one last time: if Varnish is conservative by default, it's because security prevails over convenience. A Varnish setup will by default not let you leak information if the access is restricted using the Authorization and Cookie headers, it's up to you to expand the scope of your cache.

Check out the Varnish Wiki