June 13, 2023
3 min read time

Reducing Latency in Microservices

A microservice is an autonomous subset of a software system, often responsible for managing some of the data within that system. Together the microservices of a system form a microservice architecture: a dynamic array of semi-independent and loosely coupled computing processes. Microservices are a key part of all modern system infrastructures, enabling the simultaneous deployment of complex applications and workflows that simply aren’t achievable with traditional monolithic architectures. 

 

The problem is that as you scale up to an ever increasing number of distributed, synchronized microservices, simultaneous communication becomes more and more difficult, introducing lag into the individual journeys your microservices take. This harms user experience and can end up slowing down your whole network. The solution is better caching.

 

Cache inefficiencies

A big source of caching inefficiency is unintended or unnecessary redundancy. A certain amount of redundancy in cached content is important to prevent a single point of failure from compromising your system, but if you’re repeatedly fetching duplicate data that you already have cached, you’re not caching at peak efficiency. On the flip side of that, if your caching solution is storing stale content and not retrieving the most updated version, that’s even worse. Both of these problems arise from a lack of —or improper use of— cache invalidation.

 

The importance of cache invalidation

Cache invalidation is the process of declaring cached content invalid, purging it from the cache and preventing that version from being served when requested. Without cache invalidation, cached content is allowed to grow stale and the cache becomes overloaded with junk data. However, if your cache invalidation solution is too aggressive, blindly purging content left and right, you defeat the point of having a cache in the first place, forcing user requests to travel all the way to the origin each time someone wants to review content that should have remained in the cache. 

 

Invalidation through Varnish

Varnish has a full suite of solutions dedicated to optimizing every aspect of cache invalidation, keeping the cache fresh and updated without risking losing vital content. 

Ykey is a Varnish Enterprise module that adds a tag to objects, allowing fast purging on all objects matching the assigned tag. Tens of thousands of cached objects can be purged simultaneously in this way. The purge operation can be hard or soft, with a hard purge immediately removing the matched objects from the cache and soft purge expiring the objects but keeping them for their configured grace period.

 

Another tool, Varnish Broadcaster, replicates requests to multiple Varnish caches from a single entry point, facilitating fast purging across multiple Varnish instances. 



Grace Mode

Also unique to Varnish is the Grace Mode feature. Cached content generally has a TTL (Time To Live) that serves as a countdown to expiration. Once it reaches that point, the data is purged automatically. This can create the potential for a problem during an unexpected system failure, when there may no longer be a path from origin to cache to re-cache expired content. Grace Mode overcomes this TTL barrier allowing you to serve expired content from the cache in the specific event where fresher content is somehow inaccessible. This allows you to deliver more consistent cache hits and provide a version of the content to the user that can then be updated in real-time once the more recent version becomes available.

 

Bypass issues of scalability and cut down on latency

Efficient caching will enable your microservice architecture to bypass issues of scalability and dramatically cut down on overall latency. This can be done highly effectively by implementing Varnish cache invalidation using Ykey tag-based invalidation and Broadcaster. Together, these features form a state of the art, modern approach to cache invalidation, allowing you to purge stale and redundant content in real-time, no matter the scale. 

 

The Varnish Book CTA