In this blog post, I will go over how to configure Varnish Cache to act as a secured gateway for your Amazon Web Services (AWS) S3 content. There are many reasons to use Varnish Cache as an AWS S3 gateway: caching, more efficient bandwidth utilization, centralized access with logging and security, or maybe just composing a virtual site across many different backend pieces, S3 included.
Before we begin, we need 4 things:
- an AWS S3 bucket
- an AWS IAM access ID with read/write permissions to the bucket
- the AWS IAM secret key for the above access ID
- the Varnish Cache digest VMOD
The first three items can be taken care of in the AWS Management Console. Please see Amazon AWS S3 for more information. The Varnish Cache digest VMOD can be found here. This VMOD will be used to create the required AWS S3 HMAC authorization signature.
AWS S3 as a Backend
Using AWS S3 as a backend is relatively simple. We just need to compose the proper AWS S3 Authorization header before sending any request, regardless if it’s a read (GET, HEAD) or write (POST, PUT, DELETE) request.
The VCL code is as follows:
vcl 4.0;
import digest;
backend default
{
.host = "%BUCKET%.s3.amazonaws.com";
.port = "80";
}
sub vcl_backend_fetch
{
set bereq.http.Host = "%BUCKET%.s3.amazonaws.com";
set bereq.http.Date = now;
set bereq.http.NL = {"
"};
set bereq.http.Authorization = "AWS " + "%ACCESS_ID%" + ":" +
digest.base64_hex(digest.hmac_sha1("%SECRET_KEY%",
bereq.method + bereq.http.NL +
bereq.http.Content-MD5 + bereq.http.NL +
bereq.http.Content-Type + bereq.http.NL +
bereq.http.Date + bereq.http.NL +
"/" + "%BUCKET%" + bereq.url)
);
unset bereq.http.NL;
}
Let’s go ahead and analyze the VCL code. In the above example, %BUCKET%, %ACCESS_ID%, and %SECRET_KEY% are used in place of the AWS S3 bucket name, the AWS IAM access ID, and the AWS IAM secret key for the access ID, respectively.
First, we need to import the digest vmod, which is done via ”import digest;”. Next we need to define a backend which points to our AWS S3 bucket. In this example, we will make this our default and only backend.
In sub vcl_backend_fetch, we will compose our AWS S3 request. This logic will cover all AWS S3 requests. First, we need to add the proper Host and Date headers. Next, we are going to create a temporary header, NL, which contains a single newline character. This is going to be used to help us craft our multi-line HMAC signature string, which we put into the Authorization header. The Authorization string is composed of the %ACCESS_ID%, followed by the HMAC signed request method (GET, HEAD, PUT, etc), content MD5, content type, date, and then the %BUCKET% and S3 URL. Note that the content MD5 and content type are optional and can be blank, but they are recommended when writing new content since they maintain content integrity. Also, any extra AWS S3 command headers need to be added after the date line. For more information, please see the AWS S3 reference for signing requests. After we compose the Authorization header, we unset the NL header since we don’t want to forward that header to S3.
That’s it, we now have Varnish Cache acting as our secure AWS S3 gateway!
Extending the Example
Since our Varnish Cache S3 gateway allows both reading and writing, we should protect the writing aspect to requests we trust. This can be done using an IP based ACL:
acl s3_write
{
"127.0.0.1";
}
sub vcl_recv
{
if(req.method != "GET" && req.method != "HEAD" &&
client.ip !~ s3_write) {
return(synth(403, "Access denied"));
}
}
First we define an ACL which includes all of the IPs we trust for performing write commands. Here is more information on ACLs. Then, in sub vcl_recv, we check for non read commands which come from IPs not in our s3_write ACL group. When that happens, we immediately return a 403 response code.
If you do not want to use IP based ACL groups, we could also leverage the digest VMODs ability to perform HMAC operations and validate a secret between the client and Varnish Cache, similar to how AWS performs its API security.
The full AWS S3 gateway VCL can be found here.
Finally, given the flexibility of VCL, we shouldn’t feel restricted to interacting with a only a single AWS S3 bucket. We can access multiple AWS S3 buckets and weave those requests in with a traditional backend infrastructure. What this means is that to your clients, they see a single unified site, but in reality, their requests are hitting cache, AWS S3 buckets, and traditional backend infrastructure!
Photo (c) 2012 Tanaka Juuyoh used under Creative Commons license.