February 7, 2024
10 min read time

Caching “Uncacheable” Artifactory Resources to Boost Cache Efficiency

Varnish is designed to optimize performance by enabling opportunities for efficiency improvements. In previous discussions, we have explored the role of Varnish in enhancing the performance of artifact management tools like Artifactory. Our focus now shifts to a particular aspect of this process: the way Varnish manages authorization in scenarios where it serves as an intermediary between Artifactory and its users. We will delve into Varnish's capability to cache content that is typically labeled "uncacheable" due to access restrictions. By caching private artifacts and adhering to Artifactory’s authorization rules, Varnish not only alleviates the load on the backend but also elevates cache performance, thereby bolstering developer productivity.

 

Why Cache Developer Artifacts?

To set the scene, let’s briefly talk about why enterprises would want to cache developer artifacts at all. DevOps tools, like JFrog Artifactory, see heavy usage by developer teams, but as the subsequent traffic load scales, latency and slow artifact retrieval can lead to reduced productivity and delay developers. To overcome these challenges, many enterprises deploy caching software to accelerate their DevOps pipelines.

 

Increased Artifactory Efficiency with Varnish

After integrating with Artifactory, Varnish caches artifacts to:

  • Improve performance at scale
  • Assure resilience
  • Reduce latency and increase efficiency
  • Reduce cost

While “basic” artifact caching unlocks significant improvements, we want to extract every possible performance gain by caching as many artifacts as possible. Enter Varnish Configuration Language (VCL). VCL is a unique feature of Varnish, both open source and enterprise. A domain-specific language that is transpiled into C-code and compiled into machine code at runtime for super-fast execution. VCL can be used to extend cache behavior, inspect and modify requests and responses, reroute requests, select backends and control every aspect of caching logic. Varnish Enterprise also offers a collection of proprietary modules called VMODS, which add functionality to Varnish and expose their configuration under the form of an extended VCL syntax. 

Artifactory benefits from a variety of enterprise VMODs, but we’re interested in one particular module when it comes to authorization. vmod_http makes it possible to send preflight authorization requests to Artifactory, enabling you to cache private artifacts while respecting Artifactory’s internal authorization policies. These authorization policies can then be cached in Varnish for faster access.

 

Authorized, Authenticated Access to Artifacts Can Also Be Cached with Varnish

Specific VCL code - call it “Artifactory VCL”  - introduces universal functionality for authorizing clients before serving cached content. Resources that are usually considered uncacheable due to access being restricted can be cached in a safe and efficient way, an additional optimization that reduces time-to-first-byte by saving a roundtrip to Artifactory. 

The reason why authorized resources are otherwise uncacheable is because the default behavior of most caches is to only use the URL and the Host header to create the hash that identifies the object in the cache storage, while bypassing the cache when an authorization header is spotted in the request. The purpose of a cache is to serve that cached resource to as many requesting clients as possible and offload the origin application. 

Adding stateful information, such as an Authorization header or a session cookie, to identify a user as part of the lookup hash without any particular reason will only result in more variations of a resource and heavily reduce the hit rate of the cache.

However, in the case of Artifactory, we deliberately only cache the Authorization header for the per-user preflight authorization requests. The artifacts themselves are only cached once for maximum efficiency.

To enable authorization for every client, you include the Artifactory VCL at the top of your VCL, no additional configuration required. This ensures every client is authorized for every resource before serving them content from cache, but the authorization can also be enabled or disabled for specific resources too. Essentially, vmod_http adds a level of cache personalization by checking with Artifactory whether a request is authorized.

Here’s the basic Artifactory VCL code. You can further customize the code to make it fit your exact needs:

vcl 4.1;

import http;

sub vcl_recv {
   unset req.http.X-Authorization;
   unset req.http.X-Client-Authorized;
   unset req.http.X-Method;

  if (req.http.Authorization && req.method == "GET") {
       http.init(0);
       http.req_set_url(0, http.varnish_url(req.url));
       http.req_copy_headers(0);
       http.req_set_method(0, "HEAD");
       http.req_send(0);
       http.resp_wait(0);
       if (http.resp_get_status(0) != 200) {
           return (synth(403));
      }

       set req.http.X-Client-Authorized = "true";
  }
   if (req.method == "HEAD") {
       set req.http.X-Method = "HEAD";
  }
   if (req.http.Authorization) {
       set req.http.X-Authorization = req.http.Authorization;
       unset req.http.Authorization;
  }
}

sub vcl_hash {
   if (req.http.X-Client-Authorized != "true") {
       hash_data(req.http.X-Authorization);
  }
   if (req.http.X-Method) {
       hash_data(req.http.X-Method);
  }
}

sub vcl_backend_fetch {
   if (bereq.http.X-Authorization) {
       set bereq.http.Authorization = bereq.http.X-Authorization;
       unset bereq.http.X-Authorization;
  }
   if (bereq.http.X-Method == "HEAD") {
       set bereq.method = "HEAD";
       unset bereq.http.X-Method;
  }
}

 

Artifactory API Calls and Throughput Are Significantly Reduced

An obvious question here: how do you reduce the number of requests to Artifactory if Varnish is sending preflight authorization checks on every single request? The answer is that Varnish doesn’t rely on Artifactory APIs for the authorization, at least not directly. Instead, when Varnish receives a client request, the Artifactory VCL will spawn a separate HEAD request with a copy of the URL and request headers and send it to Artifactory. If Artifactory responds with a "200 OK", the client is authorized to access the resource and we are clear to serve the object from cache.

The Artifactory VCL will also cache the result of authorizations by default, so clients accessing a resource several times will not send multiple authorization requests to Artifactory. Not only are the preflight authorization requests cached, the hash of the cache objects also contains the authorization header value, further reducing server load!

While the authorization requests are cached on a per-user basis, the artifacts themselves are only stored once, which increases the hit rate of the cache and further improves efficiency.

Even though Varnish is caching authorizations without connecting to Artifactory each time, it’s still straightforward to revoke access. Revoking a cached authorization can be done with the normal cache invalidation tools that Varnish Enterprise provides. Purges, bans and even Ykey invalidations can be used, while it’s of course possible to disable caching of authorizations completely should that be preferable.

 

Quantifiable Results of Adding Varnish as a Caching Layer in Front of JFrog Artifactory

The benefits of adding Varnish have been measured by organizations in various industries and include:

  • 20% reduction in transfer costs
  • 50% reduction in licensing fees
  • 85% lower latency expediting installs from 5 mins to 1 min
  • 41% faster dependency resolution, savings weeks of developers time

 

Although benefits have been significant, results do vary based on workflows, usage patterns and user locations. It is important to note that the inefficiencies in API access are not because JFrog Artifactory is an inferior product. In fact the opposite is true. It is a great product and any platform utilized over HTTP at scale will eventually run into these issues. This is the exact reason why Varnish was initially developed.

A key benefit of Varnish is its ability to accelerate all kinds of HTTP applications, so this authorized caching workflow is not limited to Artifactory and works in the same way for other applications that leverage authorization. For organizations grappling with the challenges of scaling artifact management tools though, integrating with Varnish Enterprise offers a path to improved performance, efficiency, and cost-effectiveness. For more information and insights, visit our dedicated artifact caching page at Varnish Software.