2008-04-29

ruby, graphviz and weblogs

For the past few days I've been following Tim's messing about with some Haskell weblog analysis code and I've been meaning to have a play myself.

Some time ago I knocked up a little ruby script to extract the search terms from my weblogs. It never did anything clever and probably didn't do anything as near as clever as some of the things I could download but, well, you know what it's like when you want to hack up something yourself.

So, last night, I got around to extending it so that it would emit different types of reports. One that I added was a type of graph output (similar to Tim's) and, once I had that, I kept thinking that doing something graphical with the data would be the next fun option.

A quick search on the net and I stumbled on GraphViz. I then had a quick read up on the DOT language and I then extended my utility so that it would emit the graph data in DOT format. It took a little bit of messing about but, finally, I had something that worked.

The result looks like this. That's a subset of my web logs (if I throw too much at GraphViz's neato command I run out of memory!) and it only shows individual words that have appeared more than once. The numbers in brackets are the count of the number of times the word appears in the data. The lines, obviously, show how the words are related to each other as found in search query strings.

There's no surprise that the graph seems to cluster around Clipper and mutt.

I'm not sure that any of this tells me anything useful, but it is a different way of looking at what searches result in hits on my site.

No comments:

Post a Comment