Magento is a well-known open source e-commerce platform that has its roots in open source, just like us. And just like us at Varnish Software, they also have a commercial product.
The commercial product is called “Adobe Commerce”, and its name exposes the fact that Magento got acquired by Adobe.
Magento is known for its flexibility and its vibrant community that offers a lot of integrations and plugins, making it an attractive framework for e-commerce.
The flipside of its flexibility and popularity is the fact that Magento is heavily dependent on caching solutions to make it perform and scale. Oftentimes the term “bloated” is used to describe Magento’s architecture.
And despite the obvious performance concerns, Magento is still the framework of choice for many. According to BuiltWith, more than 120,000 online stores are currently powered by Magento.
Varnish is a first-class citizen in the Magento world
While most popular CMS’ and frameworks support Varnish one way or another, in Magento things are slightly different: Varnish is considered a first-class citizen.
Magento has many different types of caching, as you can see in the image below. Most of these caches are used by the Magento framework, but the cache page is arguably the most important caching type.
It’s the page cache that stores the output of Magento and serves it to requesting users.
As a matter of fact, Varnish is considered the recommended caching application for full page caching, as illustrated in the image below.
Although the built-in page cache is quite decent, it’s not equipped for scale: requests hitting the built-in page cache still have to go through the web server, through the PHP runtime, and through parts of the Magento framework code.
Varnish is the perfect match, because it’s a separate runtime that is built for performance and scalability. It’s not sensitive to high-pressure traffic situations, it can handle massive concurrency, and it’s natively supported by Magento.
If you look at the image below, you’ll notice that the Magento admin panel offers Varnish-specific configuration options that can be parsed into a full-blown VCL file that can be downloaded by a click of a button.
Spoiler alert: we’re not huge fans of the VCL template that is offered by Magento, and we have our own recommended VCL file.
In the end, it will come as no surprise that thousands of Magento stores rely on Varnish for a decent performance, expecting page load times of mere milliseconds, compared to seconds when not using Varnish.
There are still performance challenges, despite the use of Varnish
Plot twist: despite the use of Varnish, there are still some performance challenges that Magento store owners are struggling with. |
Most of these challenges are related to how Magento approaches cache invalidations. Storing pages, images, and other resources in the cache is one thing, but keeping the cache up to date is an entirely different story.
When a product, category or page is updated, the change in Magento needs to be signaled to Varnish and the other caches. Otherwise, stock changes, price changes, or changes in the description of the product would never be visible on the store front.
To invalidate the cache, Magento sends a so-called PURGE request to Varnish, indicating that content needs to be removed from the cache. Subsequent user requests will then trigger a cache miss, forcing Varnish to fetch the updated content from Magento prior to storing it in the cache. That’s how Magento keeps its caches up to date.
Internally, Magento uses a collection of tags to identify and categorize content. It also allows Magento to remove multiple objects from the cache through these tags, without needing to know the exact URL of each object.
Magento uses Varnish bans
The problem is that Varnish Cache, the open source version of Varnish, doesn’t support so-called tag-based cache invalidation. To circumvent this limitation, the Magento VCL template uses, or should I say abuses, Varnish’s ban functionality.
Magento stores these tags as comma-separated values in the “X-Magento-Tags” header, which ends up being stored in the cache. When a cache invalidation takes place, regular expression patterns are used to match individual tags against this header.
While this feature works OK from a functional perspective, it doesn’t scale well. When a “ban” gets executed in Varnish, the ban expression ends up on the ban list, which gets processed by a separate thread. This thread matches each ban list item against all the objects in the cache.
The more objects and the more items on the ban list, the more server resources it takes to execute. And when the bans are successfully processed, they immediately remove the affected objects from the cache, causing cache misses for subsequent requests.
Blast radius
Certain updates in the store’s catalog, either manual or through imports, can have quite a big “blast radius”. This means that a lot of objects could potentially be deleted, causing a lot of cache misses.
The side-effects of cache invalidation are rather big in Magento, and at a certain scale they sometimes defeat the purpose of having a cache all together. It is one of the primary complaints of Magento store owners when it comes to caching and performance.
Developers often look for ways to “reheat” the cache directly after an import, to ensure the cache invalidations don’t cause too much latency for visitors.
Are we saying that Varnish Cache is bad? No!
If you’re reading all this, you’d think that we are creating the perception that Varnish Cache is a bad piece of software. And that’s definitely not the case.
Remember: Varnish Enterprise is built on top of the core of Varnish Cache.
The thing is that Magento abuses a specific feature in Varnish, and the effect it has on the performance is entirely its own fault. What Magento needs is a way to soft purge the content, marking it as expired while serving stale data to users during the revalidation process.
Spoiler alert: we’ll talk soft purging later. |
And there are other limitations in Varnish Cache, like the lack of native TLS support, the lack of dynamic DNS resolution for backend hostnames, and no mechanisms to control the memory consumption of the Varnish runtime.
Varnish Enterprise can address these challenges
When engaging with the broader Magento community, we learned about their challenges scaling Magento. We listened, but also noticed that our Enterprise solution offers functionality that addresses these challenges.
- Varnish Enterprise has support for native TLS and even supports backend TLS
- Varnish Enterprise offers native tag-based cache invalidation that is way more efficient and that has a soft purge option
- Varnish Enterprise’s “Memory Governor” feature can offer a constant memory usage of the Varnish process
- Varnish Enterprise offers dynamic backends
And that’s what this blog post is about. I’d like to showcase these 4 features and approach them from a Magento perspective. I’d like to provide some functional insight, as well as business value.
The technical implementation of these features is outside the scope of this blog post, however I would encourage you to sign up for our Magento webinar where functional, business value and technical aspects of these features are covered.
Feature 1: end-to-end TLS support in Varnish Enterprise
End-to-end TLS support is not necessarily a Magento thing, every application that uses Varnish can benefit from it.
And as mentioned, in the open source version of Varnish, you need a TLS proxy to offload the TLS encryption. The diagram below illustrates this:
A variety of technologies is used by Varnish Cache users to offload TLS: Nginx, HAProxy, and even Hitch, which is an open source TLS proxy that we maintain.
While HTTP headers like “X-Forwarded-Proto” and “X-Forwarded-For” can be used to communicate the offloaded protocol and IP address, it requires extra configuration. It is also an extra component to manage, and another thing that can fail you.
If your TLS proxy runs on a separate machine, you have to invest in separate hardware for a feature that is natively included in Varnish Enterprise.
And even if your workflow for offloading TLS in Varnish Cache is manageable, the connection between Varnish and Magento on the backend side is still happening over plain HTTP.
And that’s where Varnish Enterprise can really shine: thanks to its native end-to-end TLS implementation, Varnish can communicate with clients and backends over HTTP and HTTPS.
The diagram below illustrates this.
Varnish Enterprise allows you to configure listening endpoints for HTTPS, allows you to manage certificates, and offers ways to configure support protocols, ciphers, and cipher suites.
And for backend TLS, it’s simply a matter of setting the backend port to “443” and enabling the optional “.ssl” parameter, as you can see in the example below:
Thanks to native TLS in Varnish Enterprise, Magento users are able to:
- Simplify their architecture and reduce operational complexity
- Increase their security thanks to end-to-end TLS (even on the backend)
- Potentially decreasing their infrastructure requirements by eliminating the need for a TLS proxy
Feature 2: the Memory Governor
Another useful Varnish Enterprise feature for organizations that face a certain scale, is the Memory Governor.
This feature ensures a constant memory footprint, by resizing the cache according to runtime memory requirements.
While any version of Varnish, both open source and commercial, has the ability to limit the size of the cache, people often forget that Varnish also consumes memory to handle incoming requests and requests to the backend.
While memory consumption per thread can be limited, traffic spikes and object storage overhead require very thoughtful planning and memory dimensioning.
Varnish even has a secondary cache storage called “transient” that is used to temporarily store short lived content or uncacheable content. Remember: even though certain content is uncacheable, it still needs to be stored temporarily during the transmission from Magento through Varnish to the requesting client.
This transient storage is unbounded. Even if you carefully planned the amount of memory you’re going to assign to your Varnish servers, the transient storage can be some kind of unpredictable wildcard.
To prevent Varnish servers going out of memory, Varnish Enterprise’s Memory Governor feature will limit the total footprint of the Varnish process, and will resize the cache according to runtime memory requirements.
If a sudden traffic surge takes place that runs memory-intensive VCL logic, handling the memory requirements of those threads will take priority over keeping the cache at the same size.
By default the Memory Governor assigns 80% of the available memory on the system to the Varnish runtime. By dynamically resizing the cache, that memory target is enforced. Of course a “memory_target” runtime parameter can be configured to set the target to another percentage, or an absolute value.
Memory Governor is not a standalone feature, it is part of Varnish Enterprise’s Massive Storage Engine (MSE). It is our proprietary storage engine that combines memory and disk storage, and performance exceptionally well at scale. Instead of going into detail about MSE, I’d rather point you to other blog posts that cover this topic.
Feature 3: tag-based cache invalidation with a soft purge option
The third feature I’d like to cover in this blog post is arguably the most important one. I already spent some time explaining how Magento (ab)uses Varnish bans to perform a kind of tag-based cache invalidation.
The challenges should be clear by now:
- Poor performing invalidations at scale because of the limitations of the ban functionality
- A lot of cache misses due to the lack of soft purging in Varnish Cache
Our engagement with the Magento community made it abundantly clear that soft purging is a big feature for Magento users. As a matter of fact, we were at Meet Magento Poland in Poznan last month, where I was a speaker. A lot of people came to visit our sponsor booth after my presentation, and enquired about soft purging.
In Varnish Enterprise, we do soft purging using the Ykey VMOD. It is flexible and has an extensive API that allows for a specific configuration to inspect the “X-Magento-Tags”.
Through the configuration of a tag separator (which could be a comma, a space, or any other separator), Ykey will register all the tags from the header, and treat them as native tags, which it can use to invalidate later on.
Ykey can also register individual keys, allowing VCL writers to set tags based on certain traffic patterns. For example: if an HTTP response has a “Content-Type” response header like “image/jpeg” or “image/png”, one could assign an explicit “image” tag to all responses whose “Content-Type” response header starts with “image/”.
By supporting native tags, the invalidation of cached objects no longer requires forcing regular expressions to match tags that are part of a header. If you want to remove all objects from the cache that have the “cat_p_1” tag, Ykey will just match that tag and efficiently remove the matching objects from the cache.
From an observability point of view, the metrics and logs that Varnish Enterprise offers will clearly show which tags are registered and which ones are invalidated.
Soft purging
OK, time to finally talk about soft purging. A regular purge, or a “hard purge” as we call it, will invalidate the objects by removing them from the cache. As a result, the next visitor is met with a cache miss for that object, and depends on the performance of the backend application: the slower the backend, the longer the wait.
Soft purging will not remove the object from the cache, but will instead mark the object as expired. This means Varnish will get hit for that request, but needs to revalidate the object because it expired.
This massively improves the end-user quality of experience and reduces the pressure on Magento after cache invalidations take place. I referred to a big “blast radius” earlier, soft purges mitigate the impact of those.
Soft purging relies on grace
Thanks to Varnish’s built-in “grace” functionality, stale content can be returned while an asynchronous revalidation takes place.
Grace is extra lifetime on top of the standard Time To Live (TTL) that is assigned to an object. If an object can be cached for an hour, and you configure 15 minutes of grace, the stale object will be returned up to 15 minutes past the TTL while Varnish asynchronously updates the value.
This means that the visitors of the Magento store don’t experience any latency, but the tradeoff is that they may see slightly outdated content.
This is the formula, but keep in mind that the TTL is a moving target that can go below zero:
Total object lifetime = TTL + grace + keep |
- If the TTL value is positive, the object will be served from the cache
- If the TTL value is zero or below, but the sum of the grace value keeps the object lifetime above zero, stale content will be returned while an asynchronous fetch to Magento happens
- If the TTL value is zero or below, the remaining grace time is zero, but there is some keep time left, the object will be kept around, but synchronous revalidation takes place
Keep time has a default value of zero seconds, in Magento the standard TTL is set to 24 hours, and Magento typically sets the grace time to 3 days. This means that an object lives in the cache for about 4 days before naturally being removed.
If an expired object with remaining grace time is requested by a user, asynchronous revalidation will kick in. And that’s pretty much the effect of performing soft purges.
Feature 4: dynamic backends
The 4th and last feature I’d like to highlight is the result of yet another limitation in Varnish Cache. But again: we’re not trying to make the open source project look bad. This limitation is just perceived as one under certain circumstances.
Here’s a typical Varnish backend definition, optimized for Magento:
The “.host” and “.port” property point to the endpoint where Magento can be reached. There are also other parameters in the backend definition that define timeouts and health checking probes.
The Varnish Cache limitation we’re referring to is the fact that the “magento.example.com” hostname is only resolved to its corresponding IP address once: at VCL compile time.
This means that when the VCL file is compiled to machine code for execution, the backend hostnames are compiled. Compilation happens when Varnish starts and loads its default VCL file, or when a new VCL file is dynamically loaded by an administrator using the “varnishadm vcl.load” command.
If the value of the DNS record changes while Varnish is running, the new IP address will go unnoticed by Varnish. In a situation where an outage takes place and a new Magento server gets assigned to the hypothetical “magento.example.com” hostname, Varnish will not be able to spot the change.
In modern Cloud architectures where (auto)scaling takes place, multiple IP addresses can be returned for a single hostname. This can be used to perform DNS round-robin distribution. Unfortunately, this is not supported by Varnish Cache either.
Varnish Enterprise addresses these limitations using the ActiveDNS & Unified Director Object (UDO) modules.
ActiveDNS is used to resolve DNS records at runtime. It has configuration syntax in place to choose the frequency of the DNS resolutions, and which DNS TTL rules to abide by.
ActiveDNS also exposes logic to suggest which port should be used. UDO is responsible for monitoring the resolved IP addresses and associated ports that ActiveDNS exposes. UDO also subscribes to any changes that happen, every time a backend fetch is made. In the end the UDO module turns these endpoints into backends.
If multiple IP addresses are returned by the ActiveDNS module, UDO will turn them into separate backends and will perform DNS-based load balancing. You can choose which distribution algorithm to use: random distribution, hash-based distribution, or fallback.
DNS changes that occur at runtime will be observed by ActiveDNS & UDO, and will result in a dynamic regeneration of the Varnish backends. Backend failures will also be captured, and can result in a retry on another backend. Health probe and backend templates can be set to provision additional parameters.
In terms of functionality, these 2 modules offer dynamic backends, which adds tremendous value in dynamic environments. No need to apply manual changes to VCL or restart Varnish, simply update the DNS record, and enjoy the flexibility and resilience that this feature offers.
Dynamic environments also tend to scale out and scale in. In the Cloud or in Cloud-native architectures, more than one instance of a service is available, and autoscaling is often required. The DNS-based load balancing feature allows Varnish to autoscale its backends accordingly.
A final feature of dynamic backends that I’d like to cover is service discovery. The DNS resolution of hostnames typically leverages A records to return an IP address. ActiveDNS also supports AAAA records for IPv6 addresses.
But the SRV records are also supported, and prioritized by ActiveDNS. This record type is ideal for service discovery and exposes more information than just the IP addresses or addresses.
The “_magento._tcp.example.com” hostname has a conventional format that exposes a service name (magento) and the protocol (tcp). When you resolve the SRV record for that hostname, you’ll get multiple values, each containing the following information:
- Priority
- Weight
- Port
HostnameActiveDNS and UDO will use that information to create multiple backends, resolve the hostnames, assign port numbers and determine the order of execution.
Here’s an example:
In this case “_magento._tcp.example.com” returns 2 instance of the Magento service:
The first service endpoint is “magento1.example.com”, which exposes itself over the conventional HTTPS port (443). It also has a higher priority than “magento2.example.com”, and will be preferred.
The second service endpoint is “magento2.example.com”, which exposes itself over the port 443. This one has a lower priority, and will be the fallback in case the first endpoint is unavailable.
All of this information is fetched dynamically by Varnish Enterprise, and will result in a dynamic reconfiguration of the backends. If a third endpoint is required, it’s simply a matter of adding a value to the SRV record.
This type of flexibility significantly improves the scalability and resilience of your Magento setup. Not only can you benefit from the performance that Varnish Enterprise offers as a cache, but the dynamic capabilities offer peace of mind to organizations.
Here’s the VCL code for dynamic backends and DNS-based load balancing in Varnish Enterprise:
The main parameter that requires customization is group.set_host("magento.example.com"); , which points to “magento.example.com” as the Magento endpoint. For A records, you also need to specify a port, which is through group.set_port("443");.
But when SRV records are used for service discovery, setting the port is not required.
Want to learn more? Watch our webinar.
While I did add some VCL code when explaining the dynamic backends feature, I kept the implementation details to a minimum throughout this blog post.
If you want to learn more, see more practical examples of these 4 features, and hear more about the value proposition, please sign up for our webinar.