June 14, 2021
3 min read time

Don’t let your CDN be a single point of failure

The Fastly CDN outage on June 8th, which took down a major part of the internet, including popular websites and services, such as Reddit, Spotify, Twitch, Stack Overflow, GitHub, gov.uk, Hulu, HBO Max, Quora, PayPal, Vimeo, Shopify, Stripe, and news outlets CNN, The Guardian, The New York Times, BBC and Financial Times, taught everyone four important things: 

 

  • No one is immune to the occasional catastrophic internet outage. Not a major CDN, not the sites and services relying on them, and not the end users who depend on these services for information, to be able to do their work, to shop, to pay, to be entertained, and so on. We are all affected. 
  • With the internet becoming more centralized and being routed through fewer channels all the time, we have not even seen the beginning of these disruptive failures. They are likely to become more common and more expensive as everything from input errors to ransomware attacks create the potential for large-scale outage situations.
  • This kind of thing could happen to anyone. As the infrastructure of the internet itself grows more complex, configuration, bugs, and problems are inevitable. Errors happen to the best of us, which is why risk mitigation is your friend.
  • Don’t let your CDN be your single point of failure. Always go for redundancy, a rule of thumb in engineering. 

Single Point Failure Diagram-01

 

The June 8th internet outage 

On June 8th, websites globally experienced lengthy outages (49 minutes), which were eventually attributed to a software bug. The outage took down vital sites and services, and even though the Fastly team identified the disruption within one minute, exceptional damage, both financially and potentially reputationally, had already been done. 

 

CDN risk mitigation and eliminating the single point of failure

The backbone of engineering is to build redundancy and resilience into your platforms and architecture from the outset. In the case of the June 8th internet outage, mitigation might have been possible in two ways: a frontline mitigation strategy is using origin shield technology. An origin shield can be deployed quickly to ensure resilience, whether you're protecting a single CDN or a multi-CDN setup. It's inexpensive, fits in with existing architecture, and is both an easily deployable 'plan B' in outage situations and a flexible solution for offloading traffic as part of you daily workload. You simply add Varnish as an origin shield, and you can point domains to the origin shield servers, which can scale up to serve your traffic directly from Varnish when your CDN goes down.

The second mitigation path is slightly more involved but might be a viable long-term solution when you have heavy, mission-critical traffic demands and want to have greater control over content delivery. This strategy is having your commercial CDN running in parallel with a private CDN as a backup. A hybrid or private CDN doesn't add a lot in terms of additional cost or time, but offers a failsafe against the worst-case scenario, and more importantly, if you're making the investment, the possibility to achieve gains from the unique benefits of having a private CDN

Apart from being less expensive than the financial damage of a massive-scale outage and taking a significant hit on SLAs for companies who rely on CDN services, it’s in everyone’s interest to distribute the risk and eliminate single points of failure, particularly when managing mission-critical internet infrastructure and content delivery.

 

Defend against downtime with Varnish Private CDN and origin shield

Varnish Private CDN and origin shield technology are relied on globally by content providers, news organizations and streaming services who prize the additional resilience, flexibility and cost control that it provides. 

 

Our expert CDN engineers are ready to hear your thoughts 👇

free_15_min_slot