November 27, 2007


Filed under: Uncategorized — Tags: , , , , , — alsuren @ 12:09 am

I spent today drifting, and failing to do anything. I think this term is slowly killing me. Can’t wait for it to be over.

I have also been chatting with Alex about the “Microsoft Imagine Cup” (Herbert challenge) and I reckon it would be hilarious if it could be solved using a rainbow table. The outputs (equiv. of hashes) would be 625 bits long, and the input (equiv. of passwords) would be valid H programs.

Goes something like this:

while (still space on disk) and (length of program < 100 bytes):
    #Be sure to generate shortest/most likely programs first. This is a greedy algorithm
    bit[25][25] grid={0,0,0,0...} #25x25 bit array. 1=travelled, 0=not travelled
    while (still interesting things happening): #how to decide this?
        grid[x,y]=1 # see note
        if grid not in sqldb, or sqldb[grid].length < h_program.length:
            sqldb.record(grid, h_program) #be sure to record the length in bits as well.

Then, to get the best solution to a problem:

for grid in (paths which will solve this problem): # see note
for program in programs:
    if program solves problem:
        return program

What if we go over a grey square (resets all squares we’ve covered)? Should we:
create a new blank grid each iteration of the generator loop (cost of this might be quite large),
or store the order in which we last travelled across squares (this may help to spot infinite loops),
or just search for a few grids that might not be valid? Is there a better way of representing the state of the grid?

If someone wants to make an order of magnitude estimate of how long this would take, or how much memory (I don’t know how much memory a sparse rainbow table would take), or suggest any optimisations, feel free. I suspect that the most costly thing will be the sql transactions. I also suspect that the best algorithm will be the one with the nicest representation of grid, and has the best trade-off between time spent in the generator loop, and time spent in the search loop.

There are a bazillion ways of profiling and optimising to make it go fast (eg. by recognising discarding programs that will be rubbish, doing things in-memory before writing them to the database on disk). Writing something that can be distributed on the web, (using BOINC or the engineering dept’s gridengine) could also speed things up, but only if it looks like it won’t be able to find adequate solutions before the competition deadline. This is probably the kind of thing that you just have to leave running for a few days, generating the table, and profiling itself, then check how well it’s doing, and see if you can think of things that will speed it up if it’s still only finding really shit solutions to easy problems.


Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at

%d bloggers like this: