Varnish and Observability (Part 1: The Basics)

5:05

Is Varnish fast? Of course it is! I know that, you know that, but I want my friends, and my friends’ friends to also know that. But of course, they aren’t just going to believe me, because, as one wise man said: “bragging rights aren’t given, they have to be demonstrably earned”.

This is where monitoring and observability come into play. We need them to collect data about how Varnish is doing, and send that information somewhere™ it can be aggregated, crunched, and correlated with various events on the platform. In short: we need to make it useful. How do we do that?

Well, I’m an engineer, so of course, my go-to answer will be: “it depends”. But because I really want you to brag about Varnish performing very well, I have something for you today.

Metrics, all of them

If you know Varnish even at all, you know that we take great pride in exposing as much information as possible (and in the most efficient way possible), to the point that the only word that comes to (my) mind is “brutal”.

Let’s start with metrics: our tool of choice here is varnishstat. This deceptively simple tool that’ll happily dump all the internal counters that Varnish constantly updates. And I literally mean ALL of them, we hide nothing, no sir!

For example, here’s what I get on my laptop, with a fairly vanilla configuration:

  
    
    
    
    
      $ varnishstat -1
MGT.uptime          			261075 1.00 Management process uptime
MGT.child_start          			1     	0.00 Child process started
MGT.child_exit           			0     	0.00 Child process normal exit
MGT.child_stop           			0     	0.00 Child process unexpected exit
MGT.child_died           			0     	0.00 Child process died (signal)
MGT.child_dump           			0     	0.00 Child process core dumped
MGT.child_panic          			0     	0.00 Child process panic
MAIN.summs              			88     0.00 stat summ operations
MAIN.uptime         			261077 1.00 Child process uptime
MAIN.sess_conn          			10     0.00 Sessions accepted
MAIN.sess_fail          			96     0.00 Session accept failures
MAIN.sess_fail_econnaborted        	0     	0.00 Session accept failures: connection aborted
MAIN.sess_fail_eintr              	96     0.00 Session accept failures: interrupted system call
MAIN.sess_fail_emfile              	0     	0.00 Session accept failures: too many open files
MAIN.sess_fail_ebadf               	0     	0.00 Session accept failures: bad file descriptor
MAIN.sess_fail_enomem              	0     	0.00 Session accept failures: not enough memory
MAIN.sess_fail_other               	0     	0.00 Session accept failures: other
MAIN.client_req_400                	0     	0.00 Client requests received, subject to 400 errors
MAIN.client_req_417                	0     	0.00 Client requests received, subject to 417 errors
MAIN.client_req                   	10     0.00 Good client requests received
MAIN.esi_req                      	20     0.00 ESI subrequests
MAIN.cache_hit                     	0     	0.00 Cache hits
MAIN.cache_hit_grace               	0     	0.00 Cache grace hits
MAIN.cache_hitpass                 	0     	0.00 Cache hits for pass.
MAIN.cache_hitmiss                	18     0.00 Cache hits for miss.
MAIN.cache_miss                   	30     0.00 Cache misses
<snipped>
    

The total output contains 356 counters (again, on my vanilla laptop setup) including currently used connections and cache usage, down to the number of times a specific mutex had to wait to get locked. It’s a treasure trove of information, and while you will probably never need ALL of it, what you do need is most certainly in it.

varnishstat makes it easy to sift through a wealth of information. You can query individual counters, or glob entire categories. All of that is just one filter away on the command line. We’ve got you covered in terms of visualization options, too.

We’ve already seen the -1 argument to dump the stats and exits, but we also have the default interactive view:

output

But if you want to export the data, you’ll probably want varnishstat -j for some JSON goodness:

  
    
    
    
    
      $ varnishstat -j
{
  "version": 1,
  "timestamp": "2024-05-17T16:54:58",
  "counters": {
	"MGT.uptime": {
  	"description": "Management process uptime",
  	"flag": "c",
  	"format": "d",
  	"value": 262315
	},
	"MGT.child_start": {
  	"description": "Child process started",
  	"flag": "c",
  	"format": "i",
  	"value": 1
	},
	"MGT.child_exit": {
  	"description": "Child process normal exit",
  	"flag": "c",
  	"format": "i",
  	"value": 0
	},
<snipped>
    

Note: this is the output that is internally used by the great Varnish Prometheus Exporter.

You’re feeling old school and want some XML? We have that too, and I promise I won’t judge you:

  
    
    
    
    
      $ varnishstat -x
<?xml version="1.0"?>
<varnishstat timestamp="2024-05-17T16:56:28">
    	<stat>
            	<name>MGT.uptime</name>
            	<value>262405</value>
            	<flag>c</flag>
            	<format>d</format>
            	<description>Management process uptime</description>
    	</stat>
    	<stat>
            	<name>MGT.child_start</name>
            	<value>1</value>
            	<flag>c</flag>
            	<format>i</format>
            	<description>Child process started</description>
    	</stat>
<snipped>
    

What about logs?

Oh boy, I thought you’d never ask! Logs in Varnish are handled a tiny bit differently than in other applications, in the sense that Varnish itself doesn’t write log files. Instead, it just dumps the maximal set of data into memory and lets external loggers read that memory and format it the way YOU want and need it. In essence, Varnish is constantly logging everything in memory, and you can access that memory through a couple of tools.

First up is my favorite debugging tool of all time: varnishlog. If varnishstat was brutal, varnishlog is just bonkers, have a look:

  
    
    
    
    
      *   << Request  >> 2
-   Begin          req 1 rxreq
-   Timestamp      Start: 1556292791.607415 0.000000 0.000000
-   Timestamp      Req: 1556292791.607415 0.000000 0.000000
-   ReqStart       127.0.0.1 45212
-   ReqMethod      GET
-   ReqURL         /
-   ReqProtocol    HTTP/1.1
-   ReqHeader      Host: 0:8080
-   ReqHeader      User-Agent: curl/7.64.1
-   ReqHeader      Accept: */*
-   ReqHeader      X-Forwarded-For: 127.0.0.1
-   VCL_call       RECV
-   VCL_return     hash
-   VCL_call       HASH
-   VCL_return     lookup
-   VCL_call       MISS
-   VCL_return     fetch
-   Link           bereq 3 fetch
-   Timestamp      Fetch: 1556292791.612406 0.004991 0.004991
-   RespProtocol   HTTP/1.0
-   RespStatus     200
-   RespReason     OK
-   RespHeader     Server: SimpleHTTP/0.6 Python/3.7.3
-   RespHeader     Date: Fri, 26 Apr 2019 15:33:11 GMT
-   RespHeader     Content-type: text/html; charset=utf-8
-   RespHeader     Content-Length: 12771
-   RespProtocol   HTTP/1.1
-   RespHeader     X-Varnish: 2
-   RespHeader     Age: 0
-   RespHeader     Via: 1.1 varnish (Varnish/5.2)
-   VCL_call       DELIVER
-   VCL_return     deliver
-   Timestamp      Process: 1556292791.612452 0.005036 0.000046
-   RespHeader     Accept-Ranges: bytes
-   RespHeader     Connection: keep-alive
-   Timestamp      Resp: 1556292791.612531 0.005116 0.000079
-   ReqAcct        70 0 70 256 12771 13027
-   End
**  << BeReq    >> 3
--  Begin          bereq 2 fetch
--  Timestamp      Start: 1556292791.607599 0.000000 0.000000
--  BereqMethod    GET
--  BereqURL       /
--  BereqProtocol  HTTP/1.1
--  BereqHeader    Host: 0:8080
--  BereqHeader    User-Agent: curl/7.64.1
--  BereqHeader    Accept: */*
--  BereqHeader    X-Forwarded-For: 127.0.0.1
--  BereqHeader    Accept-Encoding: gzip
--  BereqHeader    X-Varnish: 3
--  VCL_call       BACKEND_FETCH
--  VCL_return     fetch
--  BackendOpen    33 boot.default 127.0.0.1 9999 127.0.0.1 33424
--  BackendStart   127.0.0.1 9999
--  Timestamp      Bereq: 1556292791.607716 0.000118 0.000118
--  Timestamp      Beresp: 1556292791.611821 0.004223 0.004105
--  BerespProtocol HTTP/1.0
--  BerespStatus   200
--  BerespReason   OK
--  BerespHeader   Server: SimpleHTTP/0.6 Python/3.7.3
--  BerespHeader   Date: Fri, 26 Apr 2019 15:33:11 GMT
--  BerespHeader   Content-type: text/html; charset=utf-8
--  BerespHeader   Content-Length: 12771
--  TTL            RFC 120 10 0 1556292792 1556292792 1556292791 0 0
--  VCL_call       BACKEND_RESPONSE
--  TTL            HFP 5 0 0 1556292792
--  VCL_return     pass
--  Storage        malloc Transient
--  ObjProtocol    HTTP/1.0
--  ObjStatus      200
--  ObjReason      OK
--  ObjHeader      Server: SimpleHTTP/0.6 Python/3.7.3
--  ObjHeader      Date: Fri, 26 Apr 2019 15:33:11 GMT
--  ObjHeader      Content-type: text/html; charset=utf-8
--  ObjHeader      Content-Length: 12771
--  Fetch_Body     3 length stream
--  BackendClose   33 boot.default
--  Timestamp      BerespBody: 1556292791.612237 0.004639 0.000416
--  Length         12771
--  BereqAcct      135 0 135 156 12771 12927
--  End
    

That’s for one (ONE!) client request that then goes to the origin. You get timing information for each step of the way, you get caching decisions, connection data, header manipulation, YOU.GET.EVERYTHING.

You also get a powerful query system to filter slow requests, failed responses, transactions from a certain IP address, etc. Seriously, it’s insane, but it’s absolutely necessary when you need to find that one problematic request, to debug it.

All that data miiiight be a bit too much if you just need access logs, though. Access logs should be short and sweet, with just the information you need to create some reports, billing, and other light but necessary post-processing operations. For this, we have varnishncsa: which will output lines like:

  
      127.0.0.1 - - [26/Apr/2019:08:33:11 -0700] "GET http://0:8080/ HTTP/1.1" 200 12771 "-" "curl/7.64.1"

As the name suggests, it supports the ncsa log format and allows you to tweak what goes into the line. Oh, and it supports the same excellent query system as varnishlog, just in case you want to filter requests per domain name, for example. It’s not uncommon to run multiple varnishncsa daemon, logging to different files to cover multiple properties/sites independently, for example.

Last but not least, there’s also JSON logging, which we announced a few weeks ago. It proposes a middle ground between the two previous tools. You get structured data, but you don’t necessarily need to handpick all the headers you want to know about:

  
      $ varnishlog-json -p
{
	"req":	{
		"headers":	{
			"host":	"localhost:6081",
			"user-agent":	"Mozilla/5.0 (X11; Linux x86_64; rv:121.0) Gecko/20100101 Firefox/121.0",
			"accept":	"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
			"accept-language":	"en-US,en;q=0.5",

And because it’s JSON, it’s highly processable, since the language is ubiquitous. And at the risk of repeating myself, this one too supports query filters (they are all built atop the same library, so those benefits are shared).

What to do with all that data?

This, my friends, is a question for another blog post, the next one, actually. Varnish follows the UNIX philosophy, so it makes sure that the data is easily accessible and exported, leaving the transport and utilization of this data to the care of more specialized tools.

In part 2, we’ll explore integrations and see how we can plug all of this into a larger ecosystem in just a few steps. Stay tuned, and see you in a week or two!

Read Part 2: Grafana and Friends

Varnish and Observability (Part 1: The Basics)

Metrics, all of them

What about logs?

What to do with all that data?

Painful Varnish mistakes

Varnish and Observability (Part 2: Grafana and Friends)

Logging in Varnish 4.0

Varnish and Observability (Part 1: The Basics)

Metrics, all of them

What about logs?

What to do with all that data?

You may also like this

Painful Varnish mistakes

Varnish and Observability (Part 2: Grafana and Friends)

Logging in Varnish 4.0

SUBSCRIBE TO OUR BLOG

SEARCH OUR BLOG