Django experiments

I’ve been playing about with Django and Last.fm data, intended as an eventual upgrade to historical charts. It’s fun .. I now know I play most music in March and November (university deadline time!), that my top three artists of 2010 so far are Animal Collective, Grizzly Bear and The Delgados, and that my favourite three discoveries are Starless & Bible Black, RM Hubbert and Soweto Kinch.

I also discovered that using a foreign key of an object returned by a Django QuerySet as a dictionary key prompts Django to look the actual data up. I had something like the following:

class WeekData(models.Model):
    artist = models.ForeignKey(Artist)
    plays = models.PositiveIntegerField()
    ..


tracking = defaultdict(int)
for week in WeekData.objects.all():
    tracking[week.artist] += week.plays

The dictionary update was taking ages. Confused, I enabled Mysql’s logging and discovered 25,000 lines of the following..

SELECT `id`, `name` FROM `muncher_artist` WHERE `id` = 22
SELECT `id`, `name` FROM `muncher_artist` WHERE `id` = 23
SELECT `id`, `name` FROM `muncher_artist` WHERE `id` = 24
SELECT `id`, `name` FROM `muncher_artist` WHERE `id` = 25

At which point I realised that using week.artist as the key here looks up the artist every time, meaning 25,000 useless database queries and a really, really slow function. Perhaps I was being too hopeful in my expectation that Django would be clever.

Changing the last line to:

tracking[week.artist_id] += week.plays

sped the function up by a factor of ten and lets me produce images like this in reasonable time:

It’s showing which artists occupy which chart positions as the weeks go by. The dark black line is for Fleet Foxes .. seems I went pretty mad for them :o

Onwards and Downwards

Spent an excellent week in Les Arcs, save for one hit to the head. Obligatory photos:

The End of the Day

Plus a couple more here.

Thought I’d see if eMusic was still value for money this afternoon .. astonished to find a year’s subscription is still £140 but they’ve reduced the number of songs allowed by from 90 to 35. So I’ll be giving that a miss. How do they expect to compete with Spotify?

ISIHAC

A super line from Jack Dee to close this week’s I’m Sorry I Haven’t a Clue:

And so, ladies and gentlemen, as the gentle donkey of time is lead into the tranquil Bethlehem of hope, before having his ears blown off by the Israeli army of eternity, I notice it’s the end of the show..

Gadaffi

There’s a great line in the Guardian piece ‘100 minutes in the life of Muammar Gaddafi‘:

Gaddafi certainly knows how to woo a crowd, particularly at important junctures such as this. This was after all his big chance to cement Libya’s re-entry into the bosom of the international community after 20 years in the wilderness.

The technique he chose to do so – cunningly – was to blatantly insult his audience. The representatives of the 192 nations assembled in the assembly hall were no better, he told them, than orators at Hyde Park’s Speakers’ Corner. “You make your speech and then you disappear. That’s all you are right now.”

Ask Lisa

Noticed National Rail has a ‘virtual assistant‘, sceptically wondered how bad it could be:

ask-lisa

Rejected by a natural language processor! That’ll teach me for being sceptical.

Sunburn

Exams are over, the sun came and went, I got a first (hurrah!), a job for a couple of months and now it’s just .. working out what now? Completing Chrono Trigger is my first priority :D

Spent some time moving the last.fm charts thing to Python, because Python is tasty, PHP is clumsy, and matplotlib is a superb piece of kit.

Here’s a histogram of the number of tracks I’ve listened to each week for the last few years:

last-fm-histogrampng

(x is the number of tracks in a week.) I looked at a few friends’ and there seemed quite a split between those roughly following a Gaussian distribution and those following something rather more exponential. I should experiment with different features of matplotlib, maybe narrow those bins a little. I’ve lots of little plans for other graphs and combinations.


Finally, there’s a great little piece about a quirk of the number 1/89 here. Sum the numbers of the Fibonacci Sequence in the following manner:

  .01
  .001
  .0002
  .00003
  .000005
  .0000008
  .00000013
  .000000021
  .0000000034
  .00000000055
  .000000000089
  .0000000000144
        .
    +   .
        .
----------------
  .01123595505...    = 1/89

The link has an outline proof about why it’s true.

Conjugate Gradients

Revising for an exam on machine learning and pattern recognition bitsandpieces on Friday and I came across a paper titled ‘An Introduction to the Conjugate Gradient Method Without the Agonizing Pain‘. Which is awesome.

The Conjugate Gradient Method is the most prominent iterative method for solving sparse systems of linear equations. Unfortunately, many textbook treatments of the topic are written with neither illustrations nor intuition, and their victims can be found to this day babbling senselessly in the corners of dusty libraries. For this reason, a deep, geometric understanding of the method has been reserved for the elite brilliant few who have painstakingly decoded the mumblings of their forebears. Nevertheless, the Conjugate Gradient Method is a composite of simple, elegant ideas that almost anyone can understand. Of course, a reader as intelligent as yourself will learn them almost effortlessly.

Its keywords are conjugate gradient method, preconditioning, convergence analysis and agonizing pain, and it has a chapter called ‘Eigen do it if I try‘ which (brilliant and awful title aside) is the first intuitive explanation of eigenvectors I’ve read, hurrah.