February 25, 2009

Music Visualisation Project Update

Filed under: Uncategorized — alsuren @ 11:56 am

Okay, so this is a *very* long overdue post about my project. I have an hour before lectures, and I’m running a couple of computer jobs that will take a while. (edit: about half an hour)

As some of you will know, my 4th year (MEng) project is all about music visualisation. The idea is to create a system that will take MP3 files, and turn them into thumbnail images. Songs which sound similar should also *look* similar. The idea is that it should act as a visual memory aid for DJs.

Right at the moment, I have a “baseline” system, which produces images like
this. Looking at the images from the baseline system, there don’t seem to be many similar-looking images (If you see any other than The Fox and Christopher Columbus, post a comment below).

So what’s going wrong?

There are a lot of configurable parameters of the system, so it might just be that it needs tuning. If you want to compare the performance of the system with a few parameter changes, try exploring the matrix found here It might also be that I’m trying to pack too much information into each (very small) image. Currently I’m trying to squeeze 20 independent (scalar) pieces of information into each 20×20 image. What I need to try next is cutting down to the 3 to 10 pieces of information which are actually relevant, and making 50×50 images. I think I will also need to gather different pieces of information to include (initially extracted by hand, and then automatically extracted).

Also, the human eye is not very good at comparing brightness. I will try adding fake colour to the images, and see whether a different colour map performs better.

I’ll post something else this afternoon/evening. Have to run to lecture now.



  1. Good going. Looking at the baseline examples, I agree with your thoughts – too small an image with way too much info in!

    If I’d listened more in Vision Psych lectures I could probably suggest the optimal colour map – if you really care, do check a psychology textbook…

    Come to think of it, that could also probably tell you which auditory variables are the most important. If you let me know which you’re using currently, I might be able to help? (If you want, of course!)

    Comment by Stuart — February 26, 2009 @ 1:35 am

  2. You can use supervised learning approaches to select the best features (take a set of class labels (like the ones marked on the rows/columns) and select only the features which are useful for distinguishing between the classes). This is what I will talk about next.

    The thing that requires lots of manpower is classifying the data with lots of useful labels. Currently, I’m just filling in the template at: (copying the file, putting a label after the whitespace at the end of each line). A few of the classes have been used in (scroll down to the bottom to see the html pages). If you want to fill out some tags, feel free.

    Comment by alsuren — February 26, 2009 @ 10:18 am

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Blog at

%d bloggers like this: