October 28, 2015
7 min read time

Junk junk junk

When I give a Varnish training, one thing I use when we reach the topic of cache invalidation, is the following quote: "With great power comes great responsibility". Getting things into your cache is only one side of your policy, getting things removed from your cache (invalidation) is another.

I like to emphasize the word “policy”, and make the backends responsible for providing useful information (but that’s a topic for another blog post). When the backend tells you all you need to know, it allows you to not have to handle specific cases in your VCL. This way you can keep a minimal cache policy in Varnish, containing only value-added features, such as robust invalidation schemes, or cache poisoning mitigation.

What is cache poisoning?

Cache poisoning (or cache pollution) is a kind of denial of service attack, which consists of abusing the behavior of a cache and forcing it to keep junk instead of relevant data. If you manage to fill a cache with enough junk, you can significantly slow down the happy path to legitimate content.

Take Varnish for instance. If you start making random requests at a high rate, you can fill the cache with 404 (Not Found) responses if the backends allow Varnish to cache them. For very high traffic websites, however, it would most likely have little effect on fresh content and instead hurt the long tail.

Why is Varnish vulnerable?

The reason why Varnish Cache is subject to cache poisoning is because it uses an LRU (Least Recently Used) strategy for cache eviction when your storage backend is full. Junk will start on the most recently used side, just like regular content, and will as such start as the least relevant candidate for eviction.

Let's illustrate it with a simple test case, which can be run with varnishtest:

server s1 {
    rxreq
    txresp -status 404
} -start

varnish v1 -vcl+backend "" -start

varnish v1 -expect SMA.s0.g_space == 104857600

client c1 -repeat 2 {
    txreq -url /random/junk/url
    rxresp
    expect resp.status == 404
} -run

varnish v1 -expect cache_hit == 1

What we learn from this test is that by default varnishtest will start Varnish with a limited malloc storage (SMA.s0) of maximum 100MB. We can also confirm that 404 responses are cached by default. This is actually compliant with the HTTP specifications, as defined in the RFC 7231 in section 6.1. What it means for me, is that by default junk can land in my cache.

So Varnish is vulnerable to cache poisoning both because it follows the HTTP standard and because of its eviction strategy. But “vulnerable” is a rather strong word, and it implies that Varnish is doing it wrong. One could however argue that my (subjective) opinion simply diverges from the (objective) HTTP specs.

Let's get rid of LRU

I already explained why LRU is a strategy vulnerable to cache poisoning, but I would like to rephrase it: all cache misses (including junk) are treated equally, and may trigger evictions in order to be inserted in the cache. This is essentially a hit ratio problem, and because there's no such thing as a free lunch, trying to improve your hit rate might degrade your lookup latency, CPU consumption, memory footprint and maybe other aspects...

This is of course not a new problem, and we can easily find other algorithms to replace the LRU strategy. First, the typical data access pattern of a contents site allows us to rule MRU (Most Recently Used) out. Algorithms like LFU (Least Frequently Used) and LIRS (Low-Interference Recency Set) are good examples of strategies that try to maximize the hit rate. But you would probably resort to something like LIRS when you have hard constraints on the size of your cache, when evictions are both likely and expensive.

But of course LRU has its strengths, otherwise it wouldn't be featured at all. The main advantages are the linear space and constant time complexities. This is possible because LRU doesn't keep track of any additional data, which make the implementation trivial and helps contain the complexity of the code base. Smarter algorithms exist, but none of them beat the simplicity of LRU.

One itch that I'll need to scratch is Segmented LRU. It has LRU's very low complexity and yet offers the potential of a better hit rate. Just like LRU, it should also be quite trivial to implement. I suppose I'll experiment with that next time I get stuck in an airport with a power plug available for my laptop :)

Chunk chunk chunk

One thing to know about storage backends is that they might under some circumstances allocate chunks of memory instead of the whole space needed to store an object. If a backend sends a body using the chunked transfer encoding and doesn't give a content length, there's no way to predict the required storage space.

Over time, as objects of different sizes come and go, fragmentation creeps in. In this case Varnish may allocate chunks even when the size is known beforehand. Serving a large file from several chunks, and thus accessing memory with random jumps breaks predictability. This can hurt throughput as underlying hardware caches tend to be good at pre-fetching and improving throughput when memory accesses are linear.

You can influence how Varnish will try to allocate chunks from storage using the fetch_chunksize and fetch_maxchunksize parameters, but those are experimental. You have been warned ;)

Cache eviction doesn't have to be solely based on cache lookups. For instance, our Massive Storage Engine (MSE) relies on a segmented LFU strategy. But instead of strictly relying on LFU statistics, the MSE might evict objects in the kill zone (the last segment) in order to ensure contiguous storage of objects.

The storage backends in Varnish try to solve the storage problem, and while it might feel like they don't do a good job at preventing cache poisoning, it's fine. We have other means to fight against it.

Pick your weapon

One feature that is often overlooked is the storage backend. The discussion is usually about choosing the one that performs best for your workload between malloc and file (and mse for our customers). But Varnish can actually do more with storage backends and I strongly recommend you take a look at Per's take on storage partitioning.

Now let's say you have a multi-gigs storage capacity and want to limit the pollution of your cache: you could simply add another storage for junk (let's say 100MB). According to your policy, and if like me you'd consider 404 responses to be junk, you could easily do that in VCL.

Let's try that:

server s1 {
    rxreq
    txresp -status 404
} -start

varnish v1 -arg "-s s0=malloc,100M -s junk=malloc,1M" -vcl+backend {

    sub vcl_backend_response {
        if (beresp.status == 404) {
            set beresp.storage_hint = "junk";
        }
    }
} -start

varnish v1 -expect SMA.Transient.c_bytes == 0
varnish v1 -expect SMA.s0.c_bytes == 0
varnish v1 -expect SMA.junk.c_bytes == 0

client c1 {
    txreq
    rxresp
} -run

varnish v1 -expect SMA.Transient.c_bytes == 0
varnish v1 -expect SMA.s0.c_bytes == 0
varnish v1 -expect SMA.junk.c_bytes > 0

In this test case, we override the default storage configuration done by varnishtest and instead declare the same s0 storage, but this time with a dedicated smaller storage for junk. After reading Per's post, you should now be aware of the transient storage if it weren't already the case. And finally we can observe that the cache policy of treating 404s as junk is effective.

If like me you prefer to drive your cache policy from the backend, you can do that too. Let's imagine that the backend would decide what's junk or not and give a reason:

server s1 {
    rxreq
    txresp -status 404 -hdr "X-Junk: not found"
} -start

varnish v1 -arg "-s s0=malloc,100M -s junk=malloc,1M" -vcl+backend {

    sub vcl_backend_response {
        if (beresp.http.X-Junk) {
            set beresp.storage_hint = "junk";
        }
    }

    sub vcl_deliver {
        unset resp.http.X-Junk;
    }
} -start

varnish v1 -expect SMA.Transient.c_bytes == 0
varnish v1 -expect SMA.s0.c_bytes == 0
varnish v1 -expect SMA.junk.c_bytes == 0

client c1 {
    txreq
    rxresp
} -run

varnish v1 -expect SMA.Transient.c_bytes == 0
varnish v1 -expect SMA.s0.c_bytes == 0
varnish v1 -expect SMA.junk.c_bytes > 0

With this approach, you end up with less VCL if you have more than one criterion, and you can even handle cases that can't be guessed from outside the backends. And since this is all about eviction and mitigating cache poisoning, we should probably verify it solves that too:

server s1 -repeat 2 {
    rxreq
    txresp -hdr "X-Junk: just because" -bodylen 524288
} -start

varnish v1 -arg "-s s0=malloc,100M -s junk=malloc,1M" -vcl+backend {

    sub vcl_backend_response {
        if (beresp.http.X-Junk) {
            set beresp.storage_hint = "junk";
        }
    }
    
    sub vcl_deliver {
        unset resp.http.X-Junk;
    }
} -start

varnish v1 -expect SMA.Transient.c_bytes == 0
varnish v1 -expect SMA.s0.c_bytes == 0
varnish v1 -expect SMA.junk.c_bytes == 0

client c1 {
    txreq -url "/foo"
    rxresp
    txreq -url "/bar"
    rxresp
} -run

varnish v1 -expect SMA.Transient.c_bytes == 0
varnish v1 -expect SMA.s0.c_bytes == 0
varnish v1 -expect SMA.junk.c_bytes > 0

varnish v1 -expect n_object == 1
varnish v1 -expect n_lru_nuked == 1

In this test we try to fit two 512KB objects in a 1MB storage. But 512KB is the size of the body, and the HTTP headers of the backend responses make it impossible to fit them both at the same time. We verify in this case too that Varnish didn't touch the other storages, even though there's plenty of free space there.

If I have one comment to make about the n_object and n_lru_nuked counters is that though they are very valuable and even offer us one last check, it would be great to also have them on a per-storage basis.

With this, if a link to your website contains a typo, and a herd ends up on a 404 page, it will be cached. Bear in mind that the storage's LRU list will keep this URL in the cache if it's repeatedly requested. If you use a tool like our Varnish Custom Statistics (VCS) you can probably even find out about such bogus accesses and work your way to the referrer and find the culprit.

Still not enough

Storage backends can prevent legitimate content from being penalized and evicted in favor of junk. However, when a client has a way to force a cache miss, it also becomes a way to force traffic onto the backend. The attack surface is not completely gone, but you can mitigate this last bit with good old throttling.

Once you get data on cache poisoning, you can measure and identify attackers for instance. Again, if you want to monitor it in real time you can use the VCS and make whatever decision you see fit. This is after all your cache policy: Varnish gives you many power tools to implement it and has little opinion on what it should do. Varnish doesn't try to be smart, and it doesn't need to, the VCL fills this gap.

Image is (c) 2010 Dave Parker.