I quite like the Varnish Custom Statistics, the idea behind it is super simple (aggregate data about classes of requests) and yet, its use cases are extremely diverse: people use it to monitor most requested URLs, to watch for brewing backend issues, to do A/B testing, or to create image walls to show the most read articles (here's the article about how it's done).
And today I'd like to show you yet another use case: visualizing on a world map where your users are, hence the click-bait-y title.
But before we start, I have to confess something to you: I don't like web development - at all. I find the html/css/js triplet to be a pain to work with that's only made worse by the various different browser implementations. As a C developer, JavaScript is alien and weird to me; CSS is convoluted and html is, after all, xml, and as such, should probably die in a fire.
However, I have to admit that I see the appeal: you can easily prototype, handle both logic and presentation, rely on billions of online examples and modules/plugins/libraries and of course iterate crazy fast. So yeah, I don't like the technology, but it really lowers the entry barrier for developers, and that's a good thing.
And so, this project has be coded in JavaScript, and it was actually pretty painless, even for a hater like me. I guess I'm becoming more mature as I grow old (up?), or maybe I just had big misconceptions about webdev, who knows?
The idea is to create a web page showing us a map of the world, painting countries according to the number of requests that came from them (the more requests, the darker the shade). Something like this:
For this, we are going to use:
And that's about it, let's set things up!
Hold your breath, don't blink, this is going to be super quick.
VCS actually consists of two software components: the server and the probe. The VCS probe is run on each Varnish server, reads the shmlog and pushes data to the VCS server, which most of the time is on a separate machine.
The server is started with:
vstatd
Yes, the binary is still called "vstatd", the old name of the product, but that's not important. We'll use the default ports, time window sizes and numbers.
And we have to start the probe(s), telling it where the VCS server is (let's say 192.168.0.200):
vstatdprobe 192.168.0.200
And that's it, you can stop holding your breath now. What requests are collected and how is completely driven by VCL, explaining the lack of configuration here.
It won't actually be much more complicated, first you need to:
Then, we just have to add a few lines to our VCL:
import std;
import geoip;
sub vcl_recv {
std.log("vcs-key: FROM-" + geoip.country_code(client.ip));
}
Done! For each request, we are going to log the country code of the country, prefixed with "vcs-key: FROM-", where "vcs-key:" is a marker announcing to VCS that the string should be used to tag the request, and "FROM-" is just a string for us to help with filtering.
To check that it works, let's run:
varnishlog -i VCL_Log -g raw
"-g raw" removes all grouping, and "-i VCL_Log" filters only VCL_Log lines, in other words, messages coming from std.log(). The result should look a bit like:
1977337 VCL_Log c vcs-key: FROM-US
2002175 VCL_Log c vcs-key: FROM-CN
1977340 VCL_Log c vcs-key: FROM-CN
2002178 VCL_Log c vcs-key: FROM-US
1977343 VCL_Log c vcs-key: FROM-KR
2002181 VCL_Log c vcs-key: FROM-US
1977346 VCL_Log c vcs-key: FROM-CA
2002184 VCL_Log c vcs-key: FROM-US
1977349 VCL_Log c vcs-key: FROM-CA
2002187 VCL_Log c vcs-key: FROM-CN
1977352 VCL_Log c vcs-key: FROM-US
2002190 VCL_Log c vcs-key: FROM-Unknown
1977355 VCL_Log c vcs-key: FROM-Unknown
2002193 VCL_Log c vcs-key: FROM-IR
1977358 VCL_Log c vcs-key: FROM-FR
Which is not too surprising; that's what we asked for. There are a few unknown IPs, but after all, we are using a free, lower quality database, so that's normal.
Data is starting to pour into our VCS server; let's see what's available.
With the endpoint /all/, we can retrieve all the vcs-keys seen and currently in memory:
curl $VCSIP:$VCSPORT/all/
{
"keys": [
"FROM-Unknown",
"FROM-A2",
"FROM-AD",
"FROM-AE"
]
}
To get info about one key (i.e., all the requests flagged using this tag), /key/STRING is used:
curl $VCSIP:$VCSPORT/key/FROM-RU
{
"FROM-RU": [
{
"timestamp": "2016-08-02T18:06:30",
"n_req": 42,
"n_req_uniq": "NaN",
"n_miss": 42,
"avg_restarts": 0.000000,
"n_bodybytes": 11928,
"reqbytes": 3874,
"respbytes": 22050,
"berespbytes": 0,
"bereqbytes": 0,
"ttfb_miss": 0.000166,
"ttfb_hit": "NaN",
"resp_1xx": 0,
"resp_2xx": 0,
"resp_3xx": 0,
"resp_4xx": 0,
"resp_5xx": 42
},
{
"timestamp": "2016-08-02T18:06:00",
"n_req": 43,
"n_req_uniq": "NaN",
"n_miss": 43,
"avg_restarts": 0.000000,
"n_bodybytes": 12212,
"reqbytes": 3968,
"respbytes": 22575,
"berespbytes": 0,
"bereqbytes": 0,
"ttfb_miss": 0.000180,
"ttfb_hit": "NaN",
"resp_1xx": 0,
"resp_2xx": 0,
"resp_3xx": 0,
"resp_4xx": 0,
"resp_5xx": 43
},
{
"timestamp": "2016-08-02T18:05:30",
"n_req": 44,
"n_req_uniq": "NaN",
"n_miss": 44,
"avg_restarts": 0.000000,
"n_bodybytes": 12496,
"reqbytes": 4042,
"respbytes": 23100,
"berespbytes": 0,
"bereqbytes": 0,
...
As you can see, data is aggregated in windows of 30 seconds (look at the timestamps) by default, giving you almost real-time feedback on how your data is consumed. Here we can tell that we get around 40 requests from Russia every 30 seconds, generating 22k of traffic to the clients. And we can also tell that I should fix my backend since all the requests received 5XX responses (truth is, I got lazy and didn't start the backend).
Let's finish on a more complex request, which is actually the one we are going to use:
curl $VCSIP:$VCSPORT/match/FROM-/top/300?b=10
This asks VCS:
The result should look like this:
{
"FROM-US": 14120,
"FROM-Unknown": 5778,
"FROM-CN": 2962,
"FROM-JP": 1817,
"FROM-GB": 1100,
"FROM-DE": 1060,
"FROM-KR": 1023,
"FROM-BR": 756,
"FROM-FR": 741,
"FROM-CA": 698,
"FROM-IT": 474,
"FROM-NL": 444,
"FROM-AU": 437,
"FROM-RU": 421,
"FROM-IN": 361,
"FROM-TW": 299,
...
And this is what we are going to use in our JavaScript, which we are now ready to write.
Before we start, let me state that again: this is not my turf, and I did what most new coders do: I stole code, specifically from the jqvmap README, but in my defense, the example given was doing pretty much what I needed.
HTML code is:
<script type="text/javascript" src="http://code.jquery.com/jquery-1.11.3.min.js">
<script type="text/javascript" src="https://rawgit.com/manifestinteractive/jqvmap/master/dist/jquery.vmap.js">
<script type="text/javascript" src="https://rawgit.com/manifestinteractive/jqvmap/master/dist/maps/jquery.vmap.world.js" charset="utf-8">
Note that for the last two, I just used rawgit so I could avoid hosting the code while still executing it.
And I'll also create an empty div for jqvmap to populate:
<div id="vmap" style="width: 100%; height: 90%;"></div>
Ok, everything is in place. Now we just have to create the map, and update it every 20 seconds, here's the map creation that will happen once the page is loaded:
var g_reqs = {};
function mapUpdate() {
$.getJSON("http://127.0.0.1:8888/match/FROM-/top/300?b=20&callback=?", parseAndShow);
};
function labelShow(event, label, code) {
label.text(g_req[code] +
" requests originated from " +
JQVMap.maps['world_en'].paths[code].name)
}
function regionClick(event, label, code) {
event.preventDefault();
}
jQuery(document).ready(function() {
jQuery('#vmap').vectorMap({
map: 'world_en',
hoverColor: '#005aff',
scaleColors: ['#d8f8ff', '#005ace'],
onRegionClick: regionClick,
onLabelShow: labelShow,
normalizeFunction: 'polynomial'
});
mapUpdate();
});
Some explanation about the vectorMap() arguments:
But vectormap() only creates a blank map, so we need to color it. Thankfully, jqvmap will do most of the job for us, and we only have to give it a dictionary looking like:
{ "us": 34, "ru": 54, "fr": 23, ...}
i.e., using the lowercase country code as keys and the number of requests as values, coloring will happen automagically using scaleColors and normalizeFunction.
This happens in two steps:
function parseAndShow(data) {
var countries = [
'af', 'ax', 'al', 'dz', 'as', 'ad', 'ao', 'ai', 'aq', 'ag', 'ar', 'am',
'aw', 'au', 'at', 'az', 'bs', 'bh', 'bd', 'bb', 'by', 'be', 'bz', 'bj',
'bm', 'bt', 'bo', 'ba', 'bw', 'bv', 'br', 'io', 'bn', 'bg', 'bf', 'bi',
'kh', 'cm', 'ca', 'cv', 'ky', 'cf', 'td', 'cl', 'cn', 'cx', 'cc', 'co',
'km', 'cg', 'cd', 'ck', 'cr', 'ci', 'hr', 'cu', 'cy', 'cz', 'dk', 'dj',
'dm', 'do', 'ec', 'eg', 'sv', 'gq', 'er', 'ee', 'et', 'fk', 'fo', 'fj',
'fi', 'fr', 'gf', 'pf', 'tf', 'ga', 'gm', 'ge', 'de', 'gh', 'gi', 'gr',
'gl', 'gd', 'gp', 'gu', 'gt', 'gg', 'gn', 'gw', 'gy', 'ht', 'hm', 'va',
'hn', 'hk', 'hu', 'is', 'in', 'id', 'ir', 'iq', 'ie', 'im', 'il', 'it',
'jm', 'jp', 'je', 'jo', 'kz', 'ke', 'ki', 'kr', 'kw', 'kg', 'la', 'lv',
'lb', 'ls', 'lr', 'ly', 'li', 'lt', 'lu', 'mo', 'mk', 'mg', 'mw', 'my',
'mv', 'ml', 'mt', 'mh', 'mq', 'mr', 'mu', 'yt', 'mx', 'fm', 'md', 'mc',
'mn', 'me', 'ms', 'ma', 'mz', 'mm', 'na', 'nr', 'np', 'nl', 'an', 'nc',
'nz', 'ni', 'ne', 'ng', 'nu', 'nf', 'mp', 'no', 'om', 'pk', 'pw', 'ps',
'pa', 'pg', 'py', 'pe', 'ph', 'pn', 'pl', 'pt', 'pr', 'qa', 're', 'ro',
'ru', 'rw', 'bl', 'sh', 'kn', 'lc', 'mf', 'pm', 'vc', 'ws', 'sm', 'st',
'sa', 'sn', 'rs', 'sc', 'sl', 'sg', 'sk', 'si', 'sb', 'so', 'za', 'gs',
'es', 'lk', 'sd', 'sr', 'sj', 'sz', 'se', 'ch', 'sy', 'tw', 'tj', 'tz',
'th', 'tl', 'tg', 'tk', 'to', 'tt', 'tn', 'tr', 'tm', 'tc', 'tv', 'ug',
'ua', 'ae', 'gb', 'us', 'um', 'uy', 'uz', 'vu', 've', 'vn', 'vg', 'vi',
'wf', 'eh', 'ye', 'zm', 'zw', 'kp'
];
var reqs = {};
$.each(countries, function(idx, key) {
var ckey = "FROM-" + key.toUpperCase();
if (data[ckey]) {
reqs[key] = data[ckey];
} else {
reqs[key] = 0;
}
});
g_req = reqs;
jQuery('#vmap').vectorMap('set', 'values', reqs);
setTimeout(mapUpdate, 20000);
}
At the end, I set a timer to rerun mapupdate 20 seconds later, and I updated g_reqs so that labelShow can use it when I hover over a country.
And we are done! The maps in the page are static to avoid running VCS ad vitam aeternam just for a blog post, but if you wish to see the full "actual" code, it's here.
Of course, you can make the maps even sexier by adding tooltips, fancier colors and cool effects, but as you can guess, I leave this as an exercise to you, the reader.
BUT, there's one last thing I wanted to show you before we part ways, a quick change for a deep addition. Let's say we add one line to our VCL:
import std;
import geoip;
sub vcl_recv {
std.log("vcs-key: FROM-" + geoip.country_code(client.ip));
std.log("vcs-key: TO-" + geoip.country_code(server.ip));
}
We are now recording not only the client's country but the destination's. True, you need more than one point of presence for this to be interesting, but the point is that with very few changes to your JavaScript, you can get the original map mapping the client's activity to this one, mapping the server's activity:
Where you can see in a quick glance that your Japanese servers are getting more than their share of requests.
VCS is a generic tool, offering great versatility and super easy integration, notably with JavaScript that bundles HTTP+JSON directly into the language as we have seen here. But this is only a very specific example, made to kickstart your creativity and make you think about how it can be useful for YOU and your Varnish usage.
Data analysis is already a crucial part of running a website, and is not limited to just bandwidth and requests per second. Combined with Varnish, VCS can be the tool to give you the necessary insight on who your public is and how your content is consumed to create a better, more efficient service.
Ready to learn more about VCS? Join us for our live webinar, How to identify issues in Varnish and track web-traffic stats in real-time: Getting the most out of Varnish Custom Statistics on September 8th.
Photo (c) 2005 Michael Coté used underCreative Commons license.