December 5, 2024
11 min read time

S3 Shield: Fronting Storj with Varnish on DigitalOcean

A few weeks ago, we talked about making your object storage faster and reducing costs, staying fairly high-level. We also have technical documentation on the developer portal, and of course, you can always look at the actual code on Github for nitty-gritty details.

I feel like we need an actual implementation, grounded in reality, to showcase how easy it can be to set up a global object storage system and to add a local shield to it.

The Plan

When we think about S3, AWS is usually not far behind, but today we are not going to touch Amazon Web Services. Instead, we will highlight a new innovative alternative approach.

First is Storj, who offers globally distributed, S3-compatible, object storage. Globally distributed should sound pretty cool if you’ve ever had to choose which region to use in a distributed AWS environment, or if you ever needed to actually move data from one bucket to another.

Storj, on top of being affordable, maintains the Storj open-source project, making their code available if you want to have a look at it, contribute to it or even run it.

Second is DigitalOcean, they’ve been around for a while so I would expect everybody here to know about them, but I feel like they deserve a shout out. Service is good, the interface is slick, the machines boot up quickly. All in all, deploying servers there is hassle-free, which is perfect for the short and sweet demo I have in mind here.

And third, we have Varnish Enterprise with the S3 shield. We cache HTTP and make online services fly, but I’m sure you already knew that.

Now that the team is assembled, we just need to:

  • Create a Storj bucket, push data to it and create access keys
  • Spin up a triplet of DigitalOcean servers
  • Deploy the S3 shield on said servers using ansible
  • Verify I can access the Storj data through the Varnish gateways on DigitalOcean

Ready? Let’s go!

Step 1: Storj

First, head over to Storj, and register to start a free trial, you only need an email address. Once logged in, click on “Browse” in the side bar and create a new bucket.

Browse Buckets

New Bucket

(that’s a very clever and creative name!)

I’m going to push my favorite Pikachu picture to it, but you can of course upload whatever you’d like.

Browse Files

Now that we have data in the bank, we want to let other applications use it. We do this by going to “Access Keys” and creating a new read-only key:

Access Key

Access Key 1-4-1

Now that our keys are created, we are just going to walk away from that page. Worry not, we’ll be back soon enough, but for now, don’t click on the “Close” button, and don’t navigate away from the page. If you do, you’ll lose the credentials and you’ll have to create new ones.

New access 5

Step 2: DigitalOcean

Our tour of DigitalOcean is going to be super short, we’re just here to deploy a handful of servers, or “Droplets” as they are called here. The top-right corner of the home page has the option we want:

Digital Ocean createI’ll let you pick the parameters that make sense for you, I personally chose:

  • A San Francisco data center since I’m in the Bay Area, I want my cached data to be close to me

Create droplets

  • Ubuntu Noble for the OS, it’s recent, it’s solid and I tend to prefer Debian-based systems for production (BTW I use Arch on my own machines)

    Choose an image-1

  • The smallest of the smallest instances, because it’s just a demo and I’m very cheap, but also because the CPU is rarely a bottleneck for Varnish, so I can just pick based on the network option and the RAM/disk size, depending on what I want to cache

Choose a size

  • Three droplets, because I want to play with cache sharding a bit
    Finalize Details

Confirm, wait less than a minute and your instances should be up with their own IP addresses

Droplets 3Lastly, it’s a bit of a dirty trick, but since I’ll be running ansible from my laptop, I want it to recognize the droplets:

$ for i in 209.38.148.167 64.23.248.135 209.38.132.117; do ssh root@$i -o "StrictHostKeyChecking no" true; done Warning: Permanently added '209.38.148.167' (ED25519) to the list of known hosts. Warning: Permanently added '64.23.248.135' (ED25519) to the list of known hosts. Warning: Permanently added '209.38.132.117' (ED25519) to the list of known hosts.

Do NOT try this at home work, obviously, but I’m a loose cannon and I can’t fight my nature.

Step 3: Deployment

We are almost done! Let’s get the S3 shield deployment code, and hop into the ansible directory:

$ git clone https://github.com/varnish/toolbox.git Cloning into 'toolbox'... remote: Enumerating objects: 624, done. remote: Counting objects: 100% (252/252), done. remote: Compressing objects: 100% (128/128), done. remote: Total 624 (delta 157), reused 163 (delta 121), pack-reused 372 (from 1) Receiving objects: 100% (624/624), 155.58 KiB | 427.00 KiB/s, done. Resolving deltas: 100% (261/261), done. $ cd toolbox/s3-shield/ansible/ $

We also need to point our shield towards Storj. Remember that page we left open a few minutes ago? Let’s go back to it and click on the “Show” buttons to reveal the secrets, and use them to fill the s3.conf file:

New Access - show

$ cat << EOF > s3.conf s3_bucket_address = gateway.storjshare.io:443 s3_ttl = 100s aws_access_key_id = jxmwftmfa2eazuhivtx73m4dh66q aws_secret_access_key = jyb45zb2qgqtwxt6ociijk43dgtazjqwr7msfijp2lzl352qsgn2q region = none EOF

Two notes here:

  • The region is an S3 staple, but as mentioned above, Storj doesn’t worry about it, so we just tell Varnish to ignore it.
  • I’m sharing my secret keys with you because I love you, and because I’ve already invalidated them. Do not share your secret keys with strangers, and keep them secure at all times!

We also need to tell ansible where to deploy the code, i.e. on our droplets:

$ cat << EOF > inventory.yaml s3_shields: vars: repository_token: ff5528d77434df7441f7b98813580813566b7549754c69f3 remote_user: root hosts: 209.38.148.167: 64.23.248.135: 209.38.132.117: EOF

The token is your Varnish Enterprise token and of course, the one posted here has already been invalidated, just in case you were wondering ;-)

The hosts, I’m sure you recognized them, are the three droplets we created earlier.

It’s time to deploy all this:

ansible-playbook -i inventory.yaml playbook.yaml

Let it cook for a minute and it should end with a reassuring message:

PLAY RECAP ********************************************************************* 209.38.132.117 : ok=9 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0 209.38.148.167 : ok=9 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0 64.23.248.135 : ok=9 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

Success? Maybe?

Step 4: Test

I previously mentioned I was a loose cannon, but I’m not a savage! We are not going to declare victory just because ansible said all things went well. Thankfully, good ol’ curl will confirm that we are good:

$ curl 209.38.132.117/gq-test/250px-0025Pikachu.png -vs -o /dev/null * Trying 209.38.132.117:80... * Connected to 209.38.132.117 (209.38.132.117) port 80 > GET /gq-test/250px-0025Pikachu.png HTTP/1.1 > Host: 209.38.132.117 > User-Agent: curl/8.9.1 > Accept: */* > * Request completely sent off < HTTP/1.1 200 OK < Content-Length: 42625 < Content-Security-Policy: block-all-mixed-content < Content-Type: application/octet-stream < ETag: "ab92645a6580ff0fbbcf5212e921a151" < Last-Modified: Fri, 23 Aug 2024 21:27:35 GMT < Server: Storj < Vary: Origin < X-Amz-Request-Id: 17EE7DD705A009DB < X-Request-Id: A1n7pgrsSZd < X-Xss-Protection: 1; mode=block < Date: Fri, 23 Aug 2024 23:01:10 GMT < X-Varnish: 2 < Via: 1.1 varnish (Varnish/6.0) < X-Varnish: 5 3 < Age: 7 < Via: 1.1 varnish (Varnish/6.0) < Accept-Ranges: bytes < Connection: keep-alive < { [3814 bytes data] * Connection #0 to host 209.38.132.117 left intact

You’ll notice the Server header indicates “Storj”, unsurprisingly, and also that we have not one but two X-Varnish headers; this betrays the sharding aspect I mentioned earlier. When you deploy with ansible, the shield instances are automatically configured to try and share their cache to maximize cache capacity and minimize backend traffic, we can see here that it’s working!

And if you don’t like curl, we can check the browser directly for a more graphical check:

PikachuStep 5: Conclusion

All in all, and assuming you’re already logged in on each platform, the full process should take you less than 5 minutes to do manually. Of course, speedrunning a manual deployment, is a pretty vain endeavor and is a pretty clear signal that you should just automate the whole set up; however, but I think it speaks to the ease process in general.

Before we part ways I would like to thank our friends at Storj for their help with this article and creating such a robust platform.

To get a live demo of Storj and Varnish, click here. You can also watch a quick, 5-min overview below 👇