The All Seeing Eye – Web metrics in the 21st century
You might have heard: the government are spying on you. Tracking every movement, every page viewed, every button clicked. They’ve been doing this ever since the launch of the new gov.uk site with the help of an army of new technologies geared toward watching all the things.
If this sounds like an Orwellian nightmare, try not to worry. In this case they’re doing it so that you can buy your car tax disc more easily rather than keep you talking in Newspeak*. They’re even being open with the data they collect, sharing it with anyone who cares to look.
The falling cost of memory, cloud computing and software mean that we can crunch and visualize vast quantities of data on relatively modest hardware. With tools like logstash, we can bring log files to life and make them queryable in real time. You might be thinking that tools already exist to do all these things which is true, but the real power of the new swathe of tools comes from being able to combine and query data from several sources in one interface. You might be downloading analytics data from one service, server performance data from another and combining them in a time-consuming reporting process at the moment. The new tools mean that this happens on the fly, letting you know as soon as there’s a problem or an opportunity. For the unconvinced, Etsy have a great writeup on the whys and wherefores of measuring anything and everything.
The tools available to developers are growing by the day with new services and projects appearing all the time. Companies like Heroku and Github depend on the real time data for their products and platform, a process that’s beginning to gather traction under the banner of DevOps. Given that gathering data about your apps and users has never been easier, I thought we could do a “state of the nation” for people who want to find out more.
Log files were conceived as a breadcrumb trail for programmers to diagnose problems, writing a line for every action the system performed. To a non-technical person they probably read like a foreign language but there’s useful data in there. Logstash is an open source tool for taking in logfiles and getting the useful data out into a consistent, searchable format. It can run on the same server as a site or on a centralised logging server that all your apps feed into, reading the logs in real time. Logstash suggest using an Elasticsearch backend by default.
Pros: Free, Simple, Good community support, multiple inputs/outputs
Cons: Java/JRuby based (if you cant use Java), still some gremlins in the documentation, Elasticsearch backend can be a headache to setup
A HTML5/AngularJS front end for Elastic Search. Logstash can feed log data directly into elastic search, then Kibana lets you chop it up to get to the information you need quickly. The whole thing is very slick and points toward what data mining is going to look like in years to come. A demo is worth a thousand words so I suggest you browse their site to find out more, or checkout the excellent live demo.
Pros: Very slick, easily extensible, free!
Cons: Requires Elasticsearch knowledge/infrastructure, not ideal for multi-tenant use
The ultimate program for stats nerds, Graphite will have your data scientists weeping soft tears of joy (if you feed in the right data). In the words of the original developers “Graphite is an enterprise-scale monitoring tool that runs well on cheap hardware”. It doesn’t collect the data, it just stores it and renders graphs on demand. Logstash can provide data or you can run one of several data collecting services that feed into it. Lots of flexibility here for people who know what they’re doing.
Pros: Free! Very flexible, Used by big companies (and gov.uk)
Cons: Quite a lot to master, maybe best for a dedicated devops team
A SaaS offering for people who don’t want to host their own infrastructure for gathering data. Comes preloaded with ways to gather information from all the common components of your web stack. Allows you to search and visualise that data easily. Also provides a way for teams to communicate about the issues that are raised.
Pros: Good integrations, free trial, communication features, good for multi-tenant use (many sites)
Cons: 3rd party, paid service (although free trial is available), integrations might not fit your business
Another option for those who would rather not worry about maintaining their own infrastructure. Used by some big players in the industry, Librato is a compelling offering in this space. It allows teams to communicate using tools they already use like Campfire and the data annotation features look to be particularly powerful. There’s a free developer trial for you to find out more.
Pros: Full-featured, scalable, great UI, used by big players
Cons: 3rd party, paid service (although free trial is available), geared toward developers rather than product owners maybe?
With a high profile in the Ruby community there are few Rails devs who haven’t come across New Relic before. Whilst it’s geared toward performance and uptime monitoring, it can still gather custom metrics and events (like orders placed) if you so desire. New Relic works slightly differently from other services in that you install their instrumentation right into your app which gives very fine grained detail into how it’s operating.
Pros: Great for performance monitoring, cross platform, well supported
Cons: Not as general purpose as other approaches
If this post has helped you get a handle on what’s out there in terms of metrics, let us know in the comments. Similarly if you work in devops and there’s something we’ve missed or got wrong we’ll happily take on suggestions and corrections. Go forth and analyse!
Other honourable mentions:
I didn’t have time to write up these fully, but they might be worth a look too. Let us know any we’ve missed in the comments.
- Fluentd – thanks to radekg for the tip!
*(There’s a great writeup on High Scalability here: http://highscalability.com/blog/2013/6/3/govuk-not-your-fathers-stack.html)Tweet