Monitoring Varnish with Oodle AI: Standard Dashboards & Metrics

Written by Brian Stewart | 11/3/25 8:00 AM

TL;DR
Varnish tracks everything, the challenge is knowing what to look at. This post walks through the standard Varnish dashboard on Oodle AI, built around five core views that reveal what’s really happening at the edge. Oodle AI is a modern observability platform that’s easy to run, and cost-efficient.

Why We Built a Standard Dashboard

If you’ve ever worked with Varnish, you know how deep its metrics go. Out of the box, varnishstat gives you hundreds of performance counters. Add vmod_accounting, and you can define hundreds more—customized for each domain, route, or class of request. With all that data in play, it’s easy to see how quickly things can start to pile up.

With so much visibility, the real challenge becomes, “What should I focus on?”. Luckily, our customers use Varnish for everything from video streaming and web performance optimization to accelerating CI/CD pipelines as an artifact cache, so we’ve seen a wide range of observability needs.

To tackle this question ourselves, we built a standard observability dashboard, of course using our OpenTelemetry implementation, varnish-otel. As my colleague Guillaume has mentioned, it truly has become a de facto standard for SRE and observability platforms. And at this point, everybody speaks otel, which gives you the freedom of choice and ease of integration for whichever Metric, Log, or Tracing endpoint you want.

That brings me to our new friends at Oodle AI. They offer AI-native enterprise-grade observability at open-source cost, as a fully managed drop-in replacement for Grafana, Prometheus, and Elastic Logs. And from my own experience, Oodle is fast, easy to use, and simple to get started with, making it a great match for this kind of setup.

Setting up Varnish and Oodle AI:

To point varnish-otel at Oodle, all you need is a simple systemd drop-in for the varnish-otel service that sets the correct Oodle endpoint. Then, just restart the service and you're good to go. The full set-up can be found here, so I won’t repeat every step, but in short, it’s a quick and painless configuration.

  
      mkdir -p /etc/systemd/system/varnish-otel.service.d
cat << EOF > /etc/systemd/system/varnish-otel.service.d/oodle.conf
[Service]
Environment=OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
Environment=OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://localhost:4317
Environment=OTEL_EXPORTER_OTLP_HEADERS=x-api-key=<X-API-KEY>,X-OODLE-IGNORE-SCOPE-NAME=true
Environment=OTEL_SERVICE_NAME=varnish-otel-exporter
EOF

Worth mentioning is the X-OODLE-IGNORE-SCOPE-NAME=true header. Without it, Oodle combines the OpenTelemetry scope name and metric name, which can result in duplicate prefixes like varnish_varnish_main_client_requests, as Varnish already starts its metric values with varnish_. This header disables that behavior, giving you cleaner metric names like varnish_main_client_requests. Finally, the last Environment variable OTEL_SERVICE_NAME is completely optional, and just for naming the service. The full list of varnish-otel variables can be found here.

Crafting our Dashboard

As I mentioned earlier, Oodle is easy to use, especially if you're familiar with Grafana. Once we had metrics flowing into the platform, we were able to start building our dashboard. Now the important question, “What should we pay attention to?”

If you clicked on the docs link above, you may have noticed an Important Metrics section. We’ve taken it upon ourselves to help document and draft the metrics we think you should be paying attention to, and broken them down into five sections that we believe are universally valuable for any Varnish deployment: Overview, Traffic, Saturation, Errors, and Latency. For brevity, we will just focus on the larger sections, but if you want more info, please reach out or check out the documentation link above.

Overview

The Overview section covers basic general performance, things like requests per second, bytes served per second, the throughput offload (bytes served from Varnish that did not need to be fetched from the origin), backend health, and more. The general idea here was to show the performance and yell if something is off, i.e. sick backends, if MSE4 stores are offline, or if there are panics.

Traffic:

You might notice some overlap between the Traffic and Overview sections. Metrics like requests per second or data transfer are definitely traffic related, but we chose to keep the most critical ones in the Overview so they’re visible at a glance and all at the top of the dashboard. The Traffic section dives deeper, covering things like ESI subrequests, cache invalidations, and connection rates, giving you a fuller picture of what’s flowing through Varnish.

Errors:

This section zeroes in on everything that can (and sometimes does) go wrong. It includes backend errors, MSE4-specific errors, ESI and compression issues, and more. We also visualize which Varnish instances can or cannot reach specific backends, helping pinpoint where problems are occurring in real time.Interested?
To get your hands on a copy of the dashboard, you can reach out to your account manager at Varnish or view it on GitHub here. They will be more than happy to help you and your team get set up. We also highly recommend reaching out to the Team at Oodle, especially if you use Grafana Enterprise. The transition is incredibly smooth, and the cost savings are hard to ignore.

What's Next?

We've covered metrics and dashboards in this guide, but there's more to the story. Varnish's logging provides incredible granularity for debugging and analysis. The varnishlog (or its JSON sibling varnishlog-json) captures every header, timestamp, and request for both client and backend interactions. In the next part, we'll show you how to:

Send Varnish logs to Oodle via OpenTelemetry
Correlate logs with the metrics we've set up here
Build log-based alerts that complement your metric dashboards

Have ideas for improvements or features you’d like to see? Let us know! We always welcome feedback, whether it’s a new feature request or a unique use case you’d like support with. We especially enjoy helping customers build creative solutions—for example, using vmod_accounting to track spikes in PUT requests by tagging them with request method keys. Solving these kinds of custom observability challenges is just another thing we love to do.

View full post