May 18, 2016
6 min read time

Configurable real-time stats with Varnish Administration Console

 

A couple of years ago we introduced the real-time statistics API with VAC. Varnish Cache provides a lot of statistics on website activity; however, in order to obtain this information it must be pulled from each server manually. Additionally, they are not available in real time. Meanwhile, Varnish Administration Console provides valuable real-time analytics across multiple instances.

The real-time statistics API uses a round robin database (RRD) design that consolidates and stores all data in memory. RRD allows us to store large amounts of time series. The data are stored in a circular buffer based database, thus the system storage uses a constant amount of disk space. It is structured in a way that it fetches data at predefined constant time intervals, and if it does not get a new value, it stores an unknown value for that interval. VAC leverages RRD4J [1]’s memory backend feature for storing statistical data gathered in the VAC. These data sets are consolidated against time and are normalized when fetched. As opposed to writing to files, the memory backend is fast, much like Varnish, avoiding Input/Output (I/O) bottlenecks altogether.

RRD provides support for consolidation functions and stores the consolidated values in Round Robin Archives (RRA). Using archives guarantees that the data does not balloon over time. You can define several RRAs within a single RRD in its configuration file. When picking the best archive for retrieving values it makes sure that the archive covers as much of the time frame as possible and also tries to find the one with the same or better resolution.

RRD for VAC

Varnish agent pushes varnishstat counters of interest to the VAC, which are aggregated in memory. Upon cache registration time to VAC, each cache stores the current RRD configuration in the mongo database. This includes a data source with the following definition DS:variable_name:DST:heartbeat:min:max  and the archive RRA:CF:xff:step:rows in which "step" defines how many primary data points are consolidated using the consolidation function (CF), "rows" defines the number of records stored or the size of the archive and "xff" defines how many records can be unknown for a consolidation. For example, then, an archive with time span of one day (86400 seconds) and resolution of one hour would be in this format  RA:AVERAGE:0.9:3600:24

These data sources and its archives can be configured in an XML file in which users can specify exactly how many steps and how much data they wish to keep for the given counter. However, in order to apply the changes after changing the XML file, the caches need to be deleted from VAC and re-registered with the new RRD configuration known by the VAC. The end result is the time series for each key counter, which can be collected in real-time, on a by minute, by hour, by day or by month basis.

The Real-time Statistics API supports queries based on time range in seconds past Unix Epoch and number of samples based on current time. Hence, VAC API is a data source for providing statistics to any existing monitoring tool. The following illustrates how to query group stats for default graph based on epoch time range of 1462978677 to 1462978695

Curl -u user:password http://vac/api/v1/group/{group_id}/stats/epoch/graph/{graph_name}/stats/epoch/graph/avghitratio/1462978677/1462978695/3600/total

The purpose of the graph is to group counters with a name. The query indicates 3600 resolution per sample and we expect to get 24 entries for each node based on the archive defined as in the example above. An example of response containing avghitratio graph related counters are as follows:

{

"nodes": [

          {

                   "id": "5620de940cf2836e1ebfafa3",

                   "stats": {

                  "avghitratio": [

                            {

                                    "time": 1462978676,

                                    "cache_miss": 0,                                    

                                    "client_req": 13,

                                    "cache_hitpass": 0,

                                   "cache_hit": 13

                                    },

                                   {

                                   "time": 1462978696,

                                   "cache_miss": 1,

                                   "client_req": 4,

                                   "cache_hitpass": 0,

                                  "cache_hit": 3

                                   }

                         ]

                }

       }

]

}

References:

[1] rrd4j, "A high performance data logging and graphing system for time series data", 20.07.2012, accessed 20.07.2012, available: http://code.google.com/p/rrd4j/

[2] https://oss.oetiker.ch/rrdtool/doc/rrdfetch.en.html#RESOLUTION_INTERVAL

Ready to learn more about VAC? Check out the Varnish Administration Console demo.


Check out the VAC demo

Photo (c) 2010 Hermann Kaser used under Creative Commons license.