November 14, 2008

Versioned Home Directory, and other Ideas for Projects.

I have started using version control (bzr) on my home directory. This hopes to eventually solve a few problems:

1) Sharing settings with other people. This is something that I’ve been looking for a solution to for a while (there are standard ways to share apps and themes ( and pals) but not configs. If everyone keeps their configs versioned, then it should be possible to cherry-pick changes more easily.

2) Creating consistant settings across many different linux machines (as discussed in my colourhash post. (side note relating to colourhash: many graph plotting programs have ways of automatically assigning distinct colours. I will look at that at some point too.)

3) As a backup framework: If I have all of my settings under distributed version control on 4 machines, then when I accidentally delete large chunks of my home directory (like the other day, when cmake created a folder called ‘$HOME’ which I wanted to delete…) then I don’t lose all of my rss feeds, proxy settings (which I stil haven’t managed to get working again, thanks to KDE’s incredibly fragile socks[lack-of] implementation/configuration) and email settings (resulting in me not being alerted about emails for 2 days(>20 emails))[/rant]

The progress so far is a ~/.bzrignore file as long as your arm ).

Eventually, I plan to host it on launchpad (as soon as I’ve verified that it doesn’t contain any security-critical information (I don’t think I have anything else that I would have a problem hosting on launchpad. Reading the content of my other posts might give you an idea of my views on privacy.))

==== Technobabble-filled braindump below this line ====

If anyone knows how to do nested repositories (eg. so I can get bzr to manage my ~/src as well, and so that I can have sensitive information like ~/.ssh and ~/.gpg versioned in some way that lets *me* merge them between computers, but doesn’t expose them on launchpad) give me a shout.

In other news, I taught myself a bit of perl last night, when trying to add sed-style text replacement to pidgin (by hacking apart script called whose interface was arbitrarily horrible, and only allows output replacement). I’m currently fighting with pidgin’s settings management to get persistent rules. If anyone wants it, get in touch. Otherwise, it will be in launchpad under ~/.purple/ when I get my home directory on there :D.

If anyone wants to give me input on the interface, it would be muchly appreciated. Currently, we have:
/sed foo-to-bar s/foo/bar/g
(to add a replacement rule)
/sed foo-to-bar s///
(to replace the old rule with an existing rule)

I’m thinking something more like:
/sed s/foo/bar/g
(to add a rule; a number will be assigned to each rule as an identifier)
/sed -l
(to list all rules, and associated numbers)
/sed -d #number
(to delete rules)
/sed -o s/foo/bar/g
(to only correct outgoing text)
/sed -i s/foo/bar/g
(to only correct incoming text)

Unfortunately, perl is a *horrible* language (doesn’t even have a concept of named function arguments) so the resulting code is unlikely to be anything I’m proud of.

While I think about it, I also had a load of ideas for python-based projects:
A man-page parsing command-line completion handler for ipython (and possibly bash, but bash scripts take so much longer to debug, and I get the feeling I will soon be using ipython as my default shell anyway.)
Given that debian policy forces all commands to have a man page, this is a pretty reliable way to write a powerful tab-completer. Also, since you only ever read the man page when the tab completion doesn’t work, you might as well get the tab-completer to read the man page for you.

A callback/decorator library for creating command-line programs, with an interface along the lines of:

@clargs.handles("-f", "--filename")
def input_filename(filename):
    """The filename you want to read."""
    global input = filename

@clargs.arguments("REPEATS", int)
def main(repeats):
     """Reads FILENAME to stdout REPEATS times."""
    text = open(input).read()
    for i in xrange(repeats):
        print text

if __name__ == "__main__":

It should also auto-generate help and man pages using the information given. An even more fun thing to do (with python3000) would be to use nose-style runtime inspection to to detect a function of the form:

def handle_filename(name:str):
    "The filename you want to read."""
    global input = filename

and make that handle –filename input.txt (maybe with an @shortopt(‘f’) decorator.

A subclass of numpy.ndarray that has named axes, and user-specified ranges, so…

likelihoods = semantic_array( ('t', 1000), ('x', -100, 100), ('v', -10, 10) )
# sum over v, and preserve the x and t axes.
position_likelihoods = sum(likelihoods, axis='v')
# get the best guess of x for each time t.
maximum_likelihood_x_estimate = argmax(position_estimate_probs, axis='x') 

A delayed evaluation library (might end up stealing a lot of ideas from scipy and sympy, with a good chunk of twisted to boot)
An interesting feature of python is that it doesn't have an assignment operator is *purely* a pointer-update. When you say "x=y" it just makes x point to the same thing y points to. This means that if you get passed x into a function, you can safely write x = x*10, and it won't modify x in the code that called you. This lack of side-effects (and all manner of other things) makes many python libraries look like pure-functional libraries.

On the down-side, it won't let let you override the assignment operator, so when you're dealing with large amounts of data, you can't re-use arrays without jumping through hoops. If X is a 1000000x1000000 matrix, your choices are:
X = multiply(X,10) # The canonical form, but it creates a temporary variable for the return value which is alive at the same time as X (potentially taking up twice as much memory as needed)
multiply(X, 10, output=X) # The numpy interface (potentially does the multiplication in place)
X *= 10 # works in-place, and is all very good, but what if I want to do something that's not +=,-=,*= or /=?

Then there are the slightly more hacky options, which involve delayed evaluation:
X == multiply(X, 10) # Override the logical equals operator X.__eq__ (the sympy method for writing symbolic equations). This is the most horrible, because it stops you being able to do X = (X==Y)
context.X = multiply(context.X, 10) # Override the attribute assignment operator context.__setattr__
X[:] = multiply(X, 10) # Override the item assignment operator X.__setitem__  (or sometimes X.__setslice__, I think)
Note that this last one feels quite like fortran, but it might be the least horrible of all the interfaces.

So how would these things work? A simple sympy-style one looks like this:
def multiply(X, Y):
    def deferred_calculation(output):
        numpy.multiply(X, Y, output=output)
    return deferred_calculation

# In X's class definition:
def __eq__(self, deferred_calculation):

The problem is that if you accidentally do X = multiply(X,Y) then X is just the deferred_calcuation function. That’s not very useful. On the other hand, if the returned “deferred_calculation” object can be made to behave like a numpy array, then you’re in for a win.

The fun stuff will start to happen when you start using these “deferred_calculation” objects, and passing them in and out of other functions, so you have a massive chain of deferred calculations. If you then include an interface for inspecting chains of deferred objects, you can start to write deferred-to-$LANGUAGE compilers, which would let you write “say what you mean” algorithmic code in python.

A way of writing twisted applications in a blocking style (using generator expressions).
This idea is in some ways quite similar to the idea above. I’m sure I sketched an implementation up somewhere (possibly on the eee), but I don’t seem to have posted about it. The jist of it is as follows:

In the example below, unblockify (implementation omitted) acts like a filter in two ways:
From the caller’s point of view, some_generator produces a sequence, but only things that aren’t deferreds get let out of the filter to the caller.
From the generator’s point of view, “yield” acts like a filter (or in compsci terms, a “map”). Any “deferred” objects sent through it get turned into real objects, and any real objects sent through it disappear. I’m still deciding whether to do something magical when None gets yielded. We’ll see.

def some_generator():
    while True:
        result_of_deferred = yield function_which_returns_deferred()
        yield some_immediate_function(results_of_deferred)

for out in some_generator():
    print out

While this program appears to be blocking, it shouldn’t cause unresponsiveness in GUI applications. This is because filter passes control to the twisted reactor when it’s waiting for each deferred function.

If anyone is interested in any of these projects, please shout.



  1. I’d love a copy of your plugin if you still have it!

    Comment by Matt Brubeck — June 2, 2010 @ 7:46 pm

    • I don’t think it ever got to a point where it was actually useful. My perl-fu is poor, and I never used pidgin for very long. I also didn’t migrate my old home directory across to my work laptop (currently using git for versioning of my work laptop).

      If you still want it, I’ll dig it out of my old backups and email it to you or something.

      Comment by alsuren — June 2, 2010 @ 8:16 pm

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Create a free website or blog at

%d bloggers like this: