September 24, 2024
6 min read time

Varnish and Observability (part 3: More metrics, more logs)

Hello there! This article is the third part of a series, you may want to start at the beginning (The Basics), or at least check the previous entry (Grafana and Friends) you haven't done so already.

Let’s dive right back into the heart of the topic with the question we left hanging last time: Can you add more metrics to Varnish, and log even more than what’s coming out of the box? I’m not going to try and entertain the suspense: of course you can!

std.log() goes brrrrrrrr

Let’s start with adding log lines to varnishlog, how and why. How is very straightforward with vmod-std, you just import it and sprinkle std.log() in your VCL, like this:

import std; sub vcl_recv { # the + sign acts as a string concatenator std.log("entering vcl_recv with req.url: " + req.url); if (req.url == "/admin") { std.log("this is probably uncacheable"); return(pass); } }

Then you can look at your varnishlog output and find:

- VCL_call RECV - VCL_Log entering vcl_recv with req.url: /admin - VCL_Log this is probably uncacheable - VCL_return pass

It’s obviously pretty cool since it can express some choices that the VCL made, without having to read the code directly. This is for example leveraged in this toolbox VCL that explains every decision of the built-in VCL. We also use it to more finely log hits, misses and other handlings to get a clearer picture of what is cached, what isn’t and why.

As that last link showcases, you can also see logging as “tagging”. Since std.log() will create a new line in the transition log, it becomes something you can filter on. As a silly example, you can create this VCL:

sub vcl_recv { if (req.url == "/favicon.ico") { std.log("type:ico"); } else { std.log("type:other"); } }

And then you can use the -q filter to find all requests tagged with “type:ico”:

varnishlog -q 'VCL_Log eq "type:ico"'

This can be a great way to identify requests at the VCL level and avoid complex filtering queries in varnishlog, loki or elasticsearch when you are looking for a needle in the haystack data lake.

Note: std.syslog() also exists, but instead of printing a line in the VSL, it drops it into the system log. It may be useful sometimes, but you will most likely lose the context of the request, so be sure to log all the details you may need. Also, be careful to not saturate your syslog implementation as std.syslog() is blocking and can slow down your traffic.

Counters, and creating namespaces

Counters are great because they capture the global state of your Varnish instance in an instance. In a blink you’ll know how many requests have been served in the past 5 minutes, or how many times a backend failed a request.

But that may not be enough…

Maybe you want counters for a single website, or for a specific user, or a particular country? The global pool is not going to cut it since all the data has already been flattened, and there’s no untangling it. What we need here is vmod-accounting.

As boring as its name is, vmod-accounting is an awesome tool to add to your belt if you want more granular counters. Most of the time, our users want to get per-website data, so let’s use that as an example (shamelessly lifted from the official documentation):

import accounting; sub vcl_init { accounting.create_namespace("tenant1"); accounting.create_namespace("tenant2"); accounting.create_namespace("generic"); } sub vcl_recv { if (req.http.host ~ "tenant1.com$") { accounting.set_namespace("tenant1"); return(vcl(tenant1)); } else if (req.http.host ~ "tenant2.com$") { accounting.set_namespace("tenant2"); return(vcl(tenant2)); } else { accounting.set_namespace("generic"); } }

Then, as you run varnishstat, you’ll see a bunch of new counters (around 50 per namespace):

ACCG.tenant1.total.client_req_count ACCG.tenant2.total.client_req_count ACCG.tenant3.total.client_req_count ACCG.tenant1.total.client_req_hdrbytes ACCG.tenant2.total.client_req_hdrbytes ACCG.tenant3.total.client_req_hdrbytes ACCG.tenant1.total.client_hit_count ACCG.tenant2.total.client_hit_count ACCG.tenant3.total.client_hit_count ...

And if you don’t find this granular enough, you can also add keys to further subdivide the space. Here, since I don’t have any key, there’s only the default one: “total”.

More counters, more!

I have one more thing in my bag if you are not satisfied yet. The only thing that you could be missing at this point is the ability to just create arbitrary counters. They would be counters you would name, and that you could set or atomically increment.

If that’s the kind of thing you are looking for, you can look at vmod-xcounter, which is open-source, or you can grab the Varnish Enterprise solution via subscription or on cloud marketplaces, vmod-kvstore:

vcl 4.0; import kvstore; sub vcl_init { new vsc = kvstore.init(); } sub vcl_deliver { // Count all media responses and log it in varnishstat if (resp.http.Content-Type ~ "audio|image|video") { vsc.counter("media_responses", 1, varnishstat = true); } }

This gives you total control about how everything is counted and exposed through varnishstat. This should hopefully cover all the bases!

Leaving no metric unturned

Throughout this series, we’ve been pushing the idea that knowledge is power. If your software tells you more, you can prepare better, you can correlate data more efficiently and solve problems faster. Varnish is built with this idea front and center, down to the very concept that it can expose metrics you define, rather than merely selecting whatever may be available out-of-the-box.

This flexibility opens great possibilities, the only cost is a few lines of VCL here and there, a pretty sweet deal if you ask me.