April 25, 2024
6 min read time

Varnish and JSON Logging

Varnish and JSON logging

I usually have a way with long-winded introductions, but not today. Today, I’m just too excited to share the news to delay: we have a new logger for Varnish: varnishlog-json! And now I realize I sound like a weirdo, nobody gets excited about loggers. But stay with me for a few paragraphs, and I’ll share the reasons for my enthusiasm.

Varnish and memory logs

The Varnish logging system is one of the best on the planet, there, I said it. I don’t want to go too much into details because there’s already plenty of documentation on it, but the important bits are:

  • Varnish doesn’t writes log into files, but instead into memory, to be both fast and non-blocking
  • Loggers can then “connect” to this memory space and extract logging information and do whatever they want with us

It’s a huge departure from how logging is usually handled, but the benefits are huge. The speed of it allows it to push an enormous amount of information, from modified headers to detailed timelines on how and when objects were retrieved from the cache and sent to the client. A single HTTP transaction will routinely produce 50 to 200 lines, or “records”.

And all that information can be rendered in a multitude of ways.

Verbatim, by varnishlog for debug:

* << Request >> 2 - Begin req 1 rxreq - Timestamp Start: 1713036661.666094 0.000000 0.000000 - Timestamp Req: 1713036661.666094 0.000000 0.000000 - VCL_use boot - ReqStart ::1 57272 a0 - ReqMethod GET - ReqURL /otters.html - ReqProtocol HTTP/1.1

Condensed, by varnishncsa for accounting:

::1 - - [13/Apr/2024:12:31:02 -0700] "GET http://my.website.com/otters.html HTTP/1.1" 200 764 "-" "curl/8.7.1"

Tabulated, by varnishtop for statistical purposes (most requested URLs):

612.00 ReqURL /otters.html 509.00 ReqURL /puppies.php 507.00 ReqURL /kitties.html 404.00 ReqURL /calves.html 5.00 ReqURL /baby_pigeons.php

And the awesome thing about this is that each logger can filter logs exactly the way they want. A very common use case is dual-logging:

  • varnishncsa logs everything, shipping logs over the network to a central aggregator for billing purposes
  • varnishlog only logs problematic requests (errors, too long, etc.) on disk, in excruciating detail, with a rate limit to avoid saturating the disks

But, still, we are missing something here.

 

The rise of JSON

Nowadays, we don’t just want log lines, we want structured information because with it we can create more expressive search queries and we can more easily extract interesting statistics from the base material. Of course, there are multiple generations of sysadmins experienced in sifting through files using awk, grep, csv parsers and others, and I’m not judging; I’m part of that crowd, proudly. But that’s not the point here.

However, in practice, when a tool requires you to choose a logging format, it essentially asks you what data it can omit, and which information it can flatten. And sometimes, it comes back to bite, because you may know what you’ll need, but you may not. So many times have I crafted a log format that included all the headers I knew were important, only to realize, when the platform broke, that the developers didn’t warn me about that very new, very important header. “Oh, yeah, you should have logged that one!”, they said, afterwards.

And so, we need logs that keep most of the information, as structured as possible, without being unwieldy. Something that nicely blends the depth of varnishlog with the extractability of varnishncsa. And this is where the new tool comes in.

varnishlog-json, proposes a predictable structure to describe HTTP transactions flowing through Varnish, while keeping as much as the information intact: you still get all the headers, all the timelines, and the user-generated timeline. It also automatically extracts information such as whether the transaction errored out, or if the response is a hit, a miss, a pass, or something else.

And of course, as the name implies, the output is pure JSON.

 

Compatibility is paramount

JSON is a terrible syntax (it’s my blog post, I’m allowed to speak my mind!) for a serialization language, however, it has one superpower that eclipses all of its flows: it’s ubiquitous. Virtually every programming language on Earth has support for it (yes, even COBOL), making it the lingua franca for passing structured data around.

If you are using a centralized log platform, be it loki, splunk, or good old Elasticsearch, they all support JSON, because it allows them to not care too much about what your data looks like, they can just figure it on-the-go. And if the platform happens to impose a format on you, you are basically one jq query away from reformatting it.

 

The road forward

varnishlog-json is still young, and is therefore not part of the Varnish Cache project, as the motto says: “release early, release often”. We’ve made it easy to compile, with limited dependency and you can build a docker container, just to test it if you want!

As it’s a pre-1.0, you can imagine we are very interested in your feedback. Bugs, ergonomics, ideas, opinions, we’ll take everything! Just head over to the github page and let us know!