If I were to ask you what is so great about Varnish, you'd probably answer: "the VCL, duh!". And you would be right, but maybe not for the same reason I'm loving it: the Varnish Configuration Language shifts the traditional declarative mindset of configuration to an imperative state.

It gives you great control, allowing you to actually write your policies, but beyond this, it means that plugins (or VMODs) are super easy to write. Because the VCL is imperative, plugins don't have to register themselves, care about hooks, or worry about execution order, making them a library that you can write in a matter of minutes.

And that's what we are going to explain here, step by step. A moderate knowledge of C and usual development tools (git, autotools, etc.) is expected, but nothing crazy, don't worry.

The Hello, World! is not enough

Let's try to make this VMOD interesting by having a real purpose. The vmod-example is nice and all, but it's not something you're ever going to use (reminder, it prepends "Hello, " to a string), so we're going to choose something similar but that is actually useful!

Our VMOD is going to be named "vmod-str" and will deal with a few simple string operations that are technically possible with pure VCL, but that can be cumbersome. Here's the list of functions our VMOD is going to implement:

  • count: count the number of characters in a string.
  • startswith/endswith/contains: check if a string starts/ends/contains a sub-string.
  • take: extract a substring.
  • reverse: reverse a string.
  • split: split and extract the n-th "word" of a string.

In bold are the functions we'll implement together; the others are left as an exercise to the reader because they would serve no vmod-centered pedagogical purpose.

Find giants, sit on their shoulders

First, make sure you have the Varnish headers and libraries installed on your system.

On debian-like systems:

sudo aptitude install varnish libvarnishapi-dev libvarnishapi1

On redhat-like systems:

sudo dnf install varnish varnish-devel

Next step is to clone the vmod-example, and make it our own:

git clone https://github.com/gquintard/libvmod-str.git vmod-str
cd vmod-str

Then, switch to the right branch, then run the renaming script, and prepare for compilation:

git checkout 4.0 # replace with 4.1 if on Varnish 4.1, skip if on v5
./rename-vmod-script str
./autogen.sh
./configure

And with this done, we are ready to code.

String theory 101

The first listed function was "count()", whose goal is to count the characters of a string. To begin, we'll declare it to the VMOD build system by putting this at the end of src/vmod_str.vcc:

$Function INT count(STRING S)
    
Returns the number of ascii characters in S, or -1 if S is null.

The first line is really the prototype of the function and tells us that it takes one STRING as argument, and will output an INT. STING and INT are part of the simple VMOD-type system, mapping to C types. INT is simply an int, and STRING is a const char * (meaning VMODs can't modify the strings they are given). You can get more info about that mapping here.

But what about the second line? Surely that's useless to the build system! Indeed it is, but not to the user. You see, the vcc file will be used to generate some boilerplate code (yay for automation), but it will also produce the man page for your module. We can check that directly by compiling and looking at the generated man page:

make
man src/vmod_str.3

You should be presented with a very professional-looking manual page, dandy!

We can also open src/vcc_str_if.h:

/*
 * NB:  This file is machine generated, DO NOT EDIT!
 *
 * Edit vmod.vcc and run make instead
 */

struct vmod_priv;

extern const struct vmod_data Vmod_debug_Data;

#ifdef VCL_MET_MAX
vmod_event_f event_function;
#endif
VCL_STRING vmod_info(VRT_CTX);
VCL_STRING vmod_hello(VRT_CTX, VCL_STRING);
VCL_INT vmod_count(VRT_CTX, VCL_STRING);

See that vmod_count declaration? It has been created for us, based on the vcc file, and its prototype is pretty close to what we have declared. The only "weird" element is this VRT_CTX, it's a point about the current VCL context, and we'll gladly ignore it for now.

vmod_count has been declared, that is cool, we just have to implement it. src/vmod_str.c is already known by the file system, so, at its end, let's write:

VCL_INT
vmod_count(VRT_CTX, VCL_STRING s)
{
    if (s == NULL)
		return (-1);
	else
		return (strlen(s));
}

As you can see, that's some skilled C you're witnessing! But really, that's it, we're done!

Are we, though?

Of course not! We have to test it, and make sure we didn't make some stupid mistake. Our borrowed build system has support for vtc testing, just like in Varnish, so we'll use this and rewrite src/tests/test01.vtc:

varnishtest "count()"

server s1 {
       rxreq
       txresp
} -start

varnish v1 -vcl+backend {
    import ${vmod_str};

	sub vcl_deliver {
		set resp.http.count = str.count(req.http.string);
	}
} -start

client c1 {
	txreq
	rxresp
	expect resp.http.count == "-1"

	txreq -hdr "string: 012345"
	rxresp
	expect resp.http.count == "6"

	txreq -hdr "string: 012345 789"
	rxresp
	expect resp.http.count == "10"

	txreq -hdr "string: "
	rxresp
	expect resp.http.count == "0"
} -run

Note: we import the module using a macro to grab the .so file in .libs/. In a regular VCL that line just becomes "import str;". Also, count() is not called directly, but via the str namespace: str.count(). 

And then we run:

$: make check
FAIL: tests/test01.vtc
============================================================================
Testsuite summary for libvmod-str 0.1
============================================================================
# TOTAL: 1
# PASS: 0
# SKIP: 0
# XFAIL: 0
# FAIL: 1
# XPASS: 0
# ERROR: 0
============================================================================
See src/test-suite.log
============================================================================

Whaaaaaaaaat? It doesn't pass? That's upsetting. The reason for the failure is in src/tests/test01.log, at the line starting with '----':

** c1 0.4 === expect resp.http.count == "5"
---- c1 0.4 EXPECT resp.http.count (6) == "5" failed
* top 0.4 RESETTING after ./tests/test03.vtc

In other words: the "resp.http.count" header resolved to "6" instead of the expected 5, causing the test to abort. The guilty line is there:

    txreq -hdr "string: 012345"
	rxresp
	expect resp.http.count == "6"

But the real culprit is me (and you, for your bad code review!), for that off-by-one error, "012345" is indeed 6 characters long, not 5, the code was right! Once the test is fixed:

$: make check
PASS: tests/test01.vtc
============================================================================
Testsuite summary for libvmod-str 0.1
============================================================================
# TOTAL: 1
# PASS:  1
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0
============================================================================

Hooray!

More functions, more tests!

Next in line is startswith(), which is easy enough, I encourage you to implement it yourself, and check back here once you're done. Don't worry, I'll have this picture of an otter to keep me company:

Otter in Southwold

Back? Good!

So, the amount of new code is actually ridiculously small, have a look at the addition to src/vmod_str.vcc:

$Function BOOL startswith(STRING S1, STRING S2)

Returns true if S1 starts with S2.

to src/vmod_str.c:


VCL_BOOL
vmod_startswith(VRT_CTX, VCL_STRING s1, VCL_STRING s2)
{
    if (s1 == NULL || s2 == NULL)
		return (0);
	while (*s2) {
		if (*s1 != *s2)
			return (0);
		s1++;
		s2++;
	}
	return (1);
}

and at the newly created vtc file.

There are two little gotchas here. The BOOL type in VMODs is actually just an unsigned int, 0 is false, and anything else is true. This seems pretty natural, especially if you're accustomed to C, but that's really more of a hack than anything else.

Second, you have to run "./configure" for test02.vtc to be taken into account by automake. The build system will pick up any file named following the src/tests/*.vtc pattern, but it will only do so during the configure stage.

endswith() is more of the same, so I'll leave it on the side and move to trickier stuff.

Freeing your mind, a lifetime preoccupation

The previous functions were pretty simple in that we didn't have to care about lifetimes. There was no allocation, so freeing memory was not an issue. But this changes now with the take() function.

In vcc, it's block will look like:

$Function STRING take(STRING s, INT n, INT offset = 0)

Returns a string composed of the N first characters of S. If S is shorter than N
character, the return string is truncated. If S is NULL, NULL is returned.

The tricky part is that we return a STRING, and it must be allocated. And later, freed! That's going to be a bit of a problem because we can't really ask our users to call str.free_string() once they're done with their STRING. Heck, after decades of C programming, memory leaks are still an issue due to unfreed allocations!

Thankfully, Varnish uses a workspace model, each request has its own memory pool that it can use. There are four main advantages to this:

  • isolation and containment: one request can't go crazy and hog all the memory because each workspace is bounded.
  • better CPU cache efficiency: the workspace is only used by one request, hence thread, hence CPU, ensuring good cache locality.
  • no need to resort to the system allocator (malloc) after the workspace has been created.
  • easy memory management: if the request needs to allocate something, it just has to put the data in the workspace, and everything'll be freed at once at the end of the request, avoiding reference counting, and complicated garbage collection.

The third point is obviously the one that interests us, instead of using a plain old malloc call, we'll grab some workspace storage!

Remember that "VRT_CTX" argument that's been added to all our function prototypes? It's the VCL context pointer, "ctx", and it contains a pointer to "ws", the workspace of the curent task.

Manipulating a workspace is done in three steps:

reserve: warn that you are going to need some memory, either giving the amount, or asking for 0, meaning "gimme all you got". The workspace will return the amount of space reserved.

use: once reserved, ws->f points to usable memory, fill it up!

release: once you are done, you can release the workspace, telling it how much memory you used. This MUST be done, even if you specified the exact amount of memory during reservation.

And the code looks like this:

VCL_STRING
vmod_take(VRT_CTX, VCL_STRING input, VCL_INT length)
{
    char *output;
	size_t l;

	if (s == NULL || length < 0)
		return (NULL);

    /* if length is larger than input's length, reduce it */
	l = strlen(s);
	if (l < length)
		length = l;

	if (length >= WS_Reserve(ctx->ws, length + 1)) {
		WS_Release(ctx->ws, 0);
		return (NULL);
	}

	output = ctx->ws->f;		/* Front of workspace area */
	memcpy(output, input, n);
	input[length] = '\0';

	/* Update work space with what we've used */
	WS_Release(ctx->ws, length + 1);

	return (output);
}

As usual in C, most of the code is there to check that nothing stupid happens, but once it's done, the reserve/use/release dance is quite obvious. Note that WS_Reserve can fail (returning 0), and that it can totally return more memory than what you asked for.

Truth be told, this code is actually overly verbose, looking at Varnish's headers, we notice there's a WS_Copy() that does pretty much what we want, and our code can be simplified thusly:

VCL_STRING
vmod_take(VRT_CTX, VCL_STRING input, VCL_INT length)
{
    char *output;
    size_t l;

	if (s == NULL || length < 0)
		return (NULL);

    /* if length is larger than input's length, reduce it */
	l = strlen(s);
	if (l < length)
		length = l;

    output = WS_Copy(ctx->ws, input, length + 1);

    if (output != NULL)
        output[length] = '\0';

	return(output);
}
Note: WS_Alloc() also exists, returning either NULL or a pointer to allocated memory. If you know the exact size of your allocation, you'll want to use it instead of the WS_Reserve()/WS_release() combo. So our first example was bad from the get-go, but if was a good teaching opportunity.

Results may be off(set)

But actually, we are not done yet with this function. Don't worry, I won't bore you with more tests! take() is fine and all, but it does feel incomplete, doesn't it? What if we want to return the 5 characters AFTER the third?

The obvious solution would be to update the vcc file to:

$Function STRING take(STRING s, INT n, INT offset)

But that means changing the interface our users know and love (even if, right now, there's no user per se, but you know what I mean). Plus, OFFSET is probably going to be 0 most of the time, it will be a bit annoying to have to systematically specify it.

Fortunately, we have a solution for this too: optional arguments with default values, python-style! The vcc line becomes:

$Function STRING take(STRING s, INT n, INT offset = 0)

And users will be able to write all these three variants:

str.take("abcde", 3);
str.take("abcde", 2, 2);
str.take("abcde", 1, offset = 4);

If no offset is given, the C function will receive 0, awesome!

A good starting point

It's time to wrap up, and I hope you got a glimpse of how easy it is to write a Varnish module, even though we covered only the startup grounds.

The VMOD world is big, and notably, we haven't touched objects, or toyed with private structs to keep information across requests, or VCL temperatures, but all these are just incremental points gently extending what we have covered today. Building a VMOD should now be a breeze, and if you have questions, feel free to hit me on IRC (gquintard) or Twitter, even, God forbid, via mail, I'll be happy to help.

By the way, vmod-str is not just a toy VMOD to explain things in a blog article; I do expect it to be useful and used in real-life scenarios, so if you want to hack on it and submit changes, or file a feature request, or report a bug, please do so, it's all open source!

Photo (c) 2014 fdecomite used under Creative Commons license.