Alsuren

February 11, 2010

FOSDEM

Filed under: collabora, facebook — Tags: , , , , , , — alsuren @ 10:03 pm

I went to FOSDEM over the weekend in Brussels. My mum texted me when I arrived, with a thinly veiled request for me to bring back some Belgian Chocolates. Annoyingly, my touristing extended only as far as an hour wandering down to the Palace of Justice and back, and the only chocolate shops I passed seemed to be geared up for Valentine’s Day so I gave them a miss.

Getting back in touch with what’s going on in KDE-land was good, and finally pushed me to switch back from GNOME to KDE. I’m currently still running gnome-panel because I use a gnome-panel applet for reporting how many hours I’ve been working on different things.

I wrote down my conclusions from the conference for other people from Collabora to read, but I thought I might as well share them here too:

  • KMail in KDE 4.5 is finally going to be worth using, but that’s 6 months away.
  • RDF/SPARQL is actually kinda nice, but writing raw queries without any tools is a trap.
    • The KDE guys haven’t really been talking to the Tracker guys, so while their frameworks are theoretically compatible (using the same schemas) they haven’t made very much effort to share things like their databases or tools for writing metadata back to MP3/image tags (a feature which KDE currently lacks).
  • CouchDB is secretly mostly hype and while it’s possible to use from any language without any tools, it’s got too many sharp edges to be very useful when you start trying to use it for non-trivial applications (a bit like dbus-python in that sense?).
  • SIP-Communicator is actually kinda kickass. We really need to improve our SIP stack.
  • I hear Daniel Stone’s talk was really good but it was full by the time I got there. He says that it was recorded, and that he would post to Planet Collabora when they put it online.
  • I’m not going to go into too much detail about XMPP (A few of the other guys on Planet Collabora went to the associated XMPP summit, so I’ll leave it to them to post details about that.). One thing that is worth commenting on is that there is actually surprisingly little impedance mismatch between XMPP and many web2/AJAX technologies. Watch this space for a complete JavaScript port of Prosody and a massive flood of JavaScript-based server components.

November 14, 2008

Versioned Home Directory, and other Ideas for Projects.

I have started using version control (bzr) on my home directory. This hopes to eventually solve a few problems:

1) Sharing settings with other people. This is something that I’ve been looking for a solution to for a while (there are standard ways to share apps and themes (kde-apps.org and pals) but not configs. If everyone keeps their configs versioned, then it should be possible to cherry-pick changes more easily.

2) Creating consistant settings across many different linux machines (as discussed in my colourhash post. (side note relating to colourhash: many graph plotting programs have ways of automatically assigning distinct colours. I will look at that at some point too.)

3) As a backup framework: If I have all of my settings under distributed version control on 4 machines, then when I accidentally delete large chunks of my home directory (like the other day, when cmake created a folder called ‘$HOME’ which I wanted to delete…) then I don’t lose all of my rss feeds, proxy settings (which I stil haven’t managed to get working again, thanks to KDE’s incredibly fragile socks[lack-of] implementation/configuration) and email settings (resulting in me not being alerted about emails for 2 days(>20 emails))[/rant]

The progress so far is a ~/.bzrignore file as long as your arm ).

Eventually, I plan to host it on launchpad (as soon as I’ve verified that it doesn’t contain any security-critical information (I don’t think I have anything else that I would have a problem hosting on launchpad. Reading the content of my other posts might give you an idea of my views on privacy.))

==== Technobabble-filled braindump below this line ====

If anyone knows how to do nested repositories (eg. so I can get bzr to manage my ~/src as well, and so that I can have sensitive information like ~/.ssh and ~/.gpg versioned in some way that lets *me* merge them between computers, but doesn’t expose them on launchpad) give me a shout.

In other news, I taught myself a bit of perl last night, when trying to add sed-style text replacement to pidgin (by hacking apart script called replaceX.pl whose interface was arbitrarily horrible, and only allows output replacement). I’m currently fighting with pidgin’s settings management to get persistent rules. If anyone wants it, get in touch. Otherwise, it will be in launchpad under ~/.purple/sed.pl when I get my home directory on there :D.

If anyone wants to give me input on the interface, it would be muchly appreciated. Currently, we have:
/sed foo-to-bar s/foo/bar/g
(to add a replacement rule)
/sed foo-to-bar s///
(to replace the old rule with an existing rule)

I’m thinking something more like:
/sed s/foo/bar/g
(to add a rule; a number will be assigned to each rule as an identifier)
/sed -l
(to list all rules, and associated numbers)
/sed -d #number
(to delete rules)
/sed -o s/foo/bar/g
(to only correct outgoing text)
/sed -i s/foo/bar/g
(to only correct incoming text)

Unfortunately, perl is a *horrible* language (doesn’t even have a concept of named function arguments) so the resulting code is unlikely to be anything I’m proud of.

While I think about it, I also had a load of ideas for python-based projects:
A man-page parsing command-line completion handler for ipython (and possibly bash, but bash scripts take so much longer to debug, and I get the feeling I will soon be using ipython as my default shell anyway.)
Given that debian policy forces all commands to have a man page, this is a pretty reliable way to write a powerful tab-completer. Also, since you only ever read the man page when the tab completion doesn’t work, you might as well get the tab-completer to read the man page for you.

A callback/decorator library for creating command-line programs, with an interface along the lines of:

#!/usr/bin/python
@clargs.handles("-f", "--filename")
@clargs.arguments("FILENAME")
def input_filename(filename):
    """The filename you want to read."""
    global input = filename

@clargs.main
@clargs.arguments("REPEATS", int)
def main(repeats):
     """Reads FILENAME to stdout REPEATS times."""
    text = open(input).read()
    for i in xrange(repeats):
        print text

if __name__ == "__main__":
    clargs.go()

It should also auto-generate help and man pages using the information given. An even more fun thing to do (with python3000) would be to use nose-style runtime inspection to to detect a function of the form:

def handle_filename(name:str):
    "The filename you want to read."""
    global input = filename

and make that handle –filename input.txt (maybe with an @shortopt(‘f’) decorator.

A subclass of numpy.ndarray that has named axes, and user-specified ranges, so…

likelihoods = semantic_array( ('t', 1000), ('x', -100, 100), ('v', -10, 10) )
do_something_with(likelihoods) 
# sum over v, and preserve the x and t axes.
position_likelihoods = sum(likelihoods, axis='v')
# get the best guess of x for each time t.
maximum_likelihood_x_estimate = argmax(position_estimate_probs, axis='x') 


A delayed evaluation library (might end up stealing a lot of ideas from scipy and sympy, with a good chunk of twisted to boot)
An interesting feature of python is that it doesn't have an assignment operator is *purely* a pointer-update. When you say "x=y" it just makes x point to the same thing y points to. This means that if you get passed x into a function, you can safely write x = x*10, and it won't modify x in the code that called you. This lack of side-effects (and all manner of other things) makes many python libraries look like pure-functional libraries.

On the down-side, it won't let let you override the assignment operator, so when you're dealing with large amounts of data, you can't re-use arrays without jumping through hoops. If X is a 1000000x1000000 matrix, your choices are:
X = multiply(X,10) # The canonical form, but it creates a temporary variable for the return value which is alive at the same time as X (potentially taking up twice as much memory as needed)
multiply(X, 10, output=X) # The numpy interface (potentially does the multiplication in place)
X *= 10 # works in-place, and is all very good, but what if I want to do something that's not +=,-=,*= or /=?

Then there are the slightly more hacky options, which involve delayed evaluation:
X == multiply(X, 10) # Override the logical equals operator X.__eq__ (the sympy method for writing symbolic equations). This is the most horrible, because it stops you being able to do X = (X==Y)
context.X = multiply(context.X, 10) # Override the attribute assignment operator context.__setattr__
X[:] = multiply(X, 10) # Override the item assignment operator X.__setitem__  (or sometimes X.__setslice__, I think)
Note that this last one feels quite like fortran, but it might be the least horrible of all the interfaces.


So how would these things work? A simple sympy-style one looks like this:
def multiply(X, Y):
    def deferred_calculation(output):
        numpy.multiply(X, Y, output=output)
    return deferred_calculation

# In X's class definition:
def __eq__(self, deferred_calculation):
    deferred_calculation(output=self)

The problem is that if you accidentally do X = multiply(X,Y) then X is just the deferred_calcuation function. That’s not very useful. On the other hand, if the returned “deferred_calculation” object can be made to behave like a numpy array, then you’re in for a win.

The fun stuff will start to happen when you start using these “deferred_calculation” objects, and passing them in and out of other functions, so you have a massive chain of deferred calculations. If you then include an interface for inspecting chains of deferred objects, you can start to write deferred-to-$LANGUAGE compilers, which would let you write “say what you mean” algorithmic code in python.

A way of writing twisted applications in a blocking style (using generator expressions).
This idea is in some ways quite similar to the idea above. I’m sure I sketched an implementation up somewhere (possibly on the eee), but I don’t seem to have posted about it. The jist of it is as follows:

In the example below, unblockify (implementation omitted) acts like a filter in two ways:
From the caller’s point of view, some_generator produces a sequence, but only things that aren’t deferreds get let out of the filter to the caller.
From the generator’s point of view, “yield” acts like a filter (or in compsci terms, a “map”). Any “deferred” objects sent through it get turned into real objects, and any real objects sent through it disappear. I’m still deciding whether to do something magical when None gets yielded. We’ll see.

 
@unblockify
def some_generator():
    while True:
        result_of_deferred = yield function_which_returns_deferred()
        yield some_immediate_function(results_of_deferred)

for out in some_generator():
    print out

While this program appears to be blocking, it shouldn’t cause unresponsiveness in GUI applications. This is because filter passes control to the twisted reactor when it’s waiting for each deferred function.

If anyone is interested in any of these projects, please shout.

Create a free website or blog at WordPress.com.