April 3, 2024
5 min read time

Low-latency streaming with Varnish

Over the last decade, video streaming has been firmly migrating to an Over-The-Top (OTT) model, using video streaming protocols on top of our beloved HTTP for transport. You’ve probably heard of them: HLS, initiated by Apple, DASH, initiated by Google, and maybe also CMAF to unify both formats. The benefit is obvious: if you have internet access, you can get your video, and infrastructure-wise, while HTTP isn’t as efficient as cable or satellite, we can rely on CDNs and caching to make things better.

But “better” is a spectrum and with all the innovations around “low-latency”, not all video caching solutions are created equal. So today, we answer the big question: “Does Varnish support low-latency streaming?" In short: yes. In a few more words: it doesn't specifically need to. For the full story, read-on!

Streaming and chunking

A long-standing Varnish feature is "streaming", but confusingly it had nothing to do with video for a long time. Bear with me, I promise I can clarify this!

When Varnish streams, it means that it can start sending data for the origin to the HTTP clients before it has received the full response. This is extremely useful when serving large files: there's often little reason to add latency to the client response if you have some data you could be serving.

For small files with a fast origin, the benefit is usually minimal. However, when you consider live encoding, it becomes very, very profitable: a 2-second segment will take the origin 2 seconds to produce, but if you can start sending data once you’ve encoded, say 15 frames, you’re delay is only 0.25s, (theoretically) reducing latency by 87.5%. That's the idea behind chunked CMAF, or partial segments:  the origin can start sending chunks before the whole segment is complete.

Those segments are sent through HTTP using "chunked transfer encoding", meaning the segment can still be sent without knowing its size (i.e. no `content-length` header). Of course, Varnish supports chunked encoding, and coupled with the streaming feature, it allows it to distribute files without adding latency to the chain.

Push and (long) poll

Reducing latency when delivering the segments you know about is nice, but there's another major issue that live streaming needs to tackle: how can we tell players that there's a new segment available, like, right now?

Traditionally, the video origin finishes the new segment, updates the manifest with its URL and...waits. The players refreshing the manifest after that point will see the new segment and will download it. It's simple, but it has two major drawbacks:

  • Players need to request the manifest frequently enough that they don't miss updates for too long
  • In turn, this means a lot more "fruitless" requests to the server, just checking for an update

The latter point is largely mitigated by Varnish handling conditional requests out-of-the-box. This allows it to both accept (from the origin) and send (to the client) body-less responses to speed things up while saving bandwidth.

However, we can go further, but first, a digression!

HTTP/2 push oopsie

HLS is a fairly basic format. Don't misinterpret me though, it's not an attack, and it has to be put to the credit of Apple Forking the mp3 playlist to build a ubiquitous video streaming description format, without making it insanely complex is an impressive feat.

That being said, there was one quite large misstep in the history of HLS: the infamous HTTP/2 push.

Since our problem is that we need to push new data to clients, how about we use the HTTP/2 feature that makes it push new data to clients? That would seem logical, right? This is exactly what Apple did and decided to use HTTP/2 push so that the origin could announce new segments.

BUT! They made a mistake: they made HTTP/2 mandatory, and the rest of the industry basically responded "lol, no!". I won't bore you with the details too much, but the HTTP/2 requirement got yanked, and Apple started looking at a solution that would work with HTTP/1.

Please wait here

With push out of the way, we have to rely on the next best thing: long polling. Essentially, clients can request content that may not exist, and if it doesn't the server will just wait before replying, hoping the content finally comes up.

This is a massive improvement compared to the original situation that forced the clients to hammer the server, begging for an update. Here, all the players will send one request each, they'll initially block but everybody will be unblocked as soon as the data is available.

In practice, this is done using query string directives that essentially say "I don't need to see the manifest until it includes a new segment", and using preload hints, which is just the server announcing soon-to-be-created-but-not-just-yet segments.

Logically, it all makes good sense, but trouble is lurking! (<plays ominous music>)

Request coalescing

By having all our players waiting for an update, and unlocking them all at once, we've created a thundering herd, which video origins are usually not well equipped to deal with.

That's where Varnish comes in with its not-at-all secret weapon: request coalescing! When multiple requests are made to the same uncached resource, Varnish "coalesces" them into a single origin request, and serves all the clients from that one request.

This probably makes you raise an eyebrow though: haven't we just moved the thundering herd from the video origin to Varnish? Yep! Yep, we have indeed! But it's okay, highly scalable, massively performant HTTP delivery is Varnish's bread and butter, and it can take it (reaching 1.3Tbps video traffic on a single server is a recent highlight), freeing the video origin to do what it does best.


In short, Varnish ”just” does HTTP, but it does it very, very well, leading to compliance AND performance with all HTTP media streaming platforms and protocols. And the best part is that it can do so out-of-the-box, with no particular tuning, which is a boon if you want a versatile solution that can handle your video delivery, AND your website caching, AND your API throttling AND your artifact protection.

We love talking about this sort of thing, and we’ll be at NAB Show this year to do just that. If you’re in town why not stop by the Varnish booth and talk tech with the team.