Alsuren

January 24, 2009

Ideas for Programming Aids

Filed under: Uncategorized — alsuren @ 1:37 am

First off, it should be noted that I’m spending my friday night posting on my blog. Yes. There’s definitely something wrong with that. Also, I have just instinctively written this document in restructuredtext, and can’t be arsed to change it to html for wordpress. If anyone knows of blogging software that supports restructuredtext, I might be tempted to change once again.

The rest of this post is me trying to describe how I’ve recently found myself developing software, and how I think a software development/testing framework should behave in order to fit in with this methodology.

I write all of my code in python, making extensive use of ipython and scipy/matplotlib. Most of the things I write are to solve specific problems in a very throw-away manner (I basically use python in the same way most people might use perl/bash/matlab scripts).

The development method goes something like this:

* Open up kate, and add the standard boilerplate to a python file:

#!/usr/bin/python
from __future__ import division
from sys import argv
import numpy as N

def main(arg1=”default value”, arg2=”something else”):
“”” Does whatever. “””
pass

if __name__ == “__main__”:
main(*argv[1:])

* Start writing some code in kate, (generally top-down to solve the problem at hand, with functions missing).
* Implement a few of the lower level functions (without really thinking about speed or comments, but generally following most of PEP8.)
* Either %run whatever.py in ipython.
* Fix missing/broken implementations/add “return result” to the end of some functions.
* %run whatever.py and repeat until it does what you want it to do.
* If it’s taking too long, %prun main(some parameters that will take less time to compute results for) and check for the function with the highest cumulative time (say it’s called slow_function().
* Rename slow_function() to canonical_slow_function() and write another implementation with a few loops unrolled or something.
* Add “assert result_of_slow_function == canonical_slow_function()” to wherever the function is used.
* Reload, and %prun main() again (and notice that the new implementation takes 10x less time than the canonical one. Also feel safe that it’s numerically correct, since there is a built in regression test)
* Comment out the assert, and run again with the parameters that were taking too long before.

So what do I want from a rapid-command-line-development framework?

* A command-line program a bit like django-admin.py (or python paster or whatever) or a set of magics/functions that can be used from within ipython, that will create a single python file with the boilerplate for a simple command-line app, and open your default editor. It should look something like this (and already be executable on unix):

#!/usr/bin/python
from alternatives import costs, reimplementation
from commandline import run_if_main

AUTHOR=”David Laban ”

@costs.cputime
def main(arg1=”default value”, arg2=”something else”):
“”” Does whatever. “””
for i in range(100000000000):
pass # waste time

@reimplementation(“faster”)
def main(arg1, arg2):
pass

run_if_main()

* Running this program with no arguments should run main() with its default arguments (see below for more information on how this should be implemented)

* Running the program with the option “–help” or “-h” will print a usage message of the following form:
whatever.py [–arg1=”default value”] [–arg2=”something else”]
Does whatever.

* The admin script/ipython magics should also be able to do things like:
* script-admin commit -m “First revision of whatever.py”, which will search for evidence of a revision control system, and then commit (running any tests that exist first, obviously, and prompting for any)
* script-admin set-defaults-from whatever.py, which would take any modifications you make to the boilerplate variables, and save them somewhere useful, so that they become the defaults.
* script-admin create-manpage whatever.py, which would generate a manpage with the author information at the bottom.
* script-admin create-bashcomp whatever.py, which would create a bash completion file
* script-admin install whatever.py /usr/local, which will install the script, and man pages to the prefix /usr/local.
* script-admin convert-to-project whatever.py, which will take the script, and put it into a directory with a setup.py and its own revision control (if possible, keeping any revision history from the original file intact)
* script-admin test whatever.py should run whatever tests are associated with whatever.py.

* The framework should always try to import the module and inspect it dynamically, rather than trying to parse python code. All of the above things can be done dynamically.
* run_if_main() may use stack introspection or whatever means necessary to make the API simple. It should look for a local or global variable in the calling scope called main, check whether it is callable, and inspect its arguments list. It should then use the optparse module or similar in order to forward command line arguments in to main. If no main function exists, it should look for objects of the form “command_*”, where * is the first command-line argument. It should then strip the first command-line argument, and act as before, but using command_* instead of main.

* script-admin should be written using the commandline module

* alternatives should focus on functions without side-effects. This means it can use concepts like invariants (haskell-inspired) for testing.

* A possible implementation of the alternatives module might do something like this:
* Run either the first or second implementation (or both) of main() with the default arguments taken from the first one (initially at random, but probably using something smarter once there’s some data ( I’m sure Alex will be first to suggest http://www.gene-expression-programming.com ) ).
* Optionally store statistics about the CPU time in ~/.alternatives/costs, so that the best implementation can be used in future.

* All decorators in alternatives should have a lightweight alternative implementation which doesn’t add any overhead to function calls or tracebacks, and only incurs a small penalty at define-time, when the choice of best implementation is made, and docstrings/default arguments are copied from the canonical implementation.

* alternatives should include a function/command called “test”, “train”, “sleep” or “dream”, which goes through all functions registered in ~/.alternatives, and test them against each other (possibly using cached input values from real runs, and possibly using the coverage testing methodology). This should run as a separate process at nice 19 or something. There should also be a function/command called “report”, which produces either profiler-like output data, or matplotlib-quality graphs for each of the functions.

* If the @alternatives.costs.cputime decorator is used, then the return values should always be the same, and the implementation which returns in the shortest time should be chosen. It should be possible to create other cost functions that take the return value as in input, and return a score. This might be useful for evaluating alternative implementations of things like video compression or stochastic optimisation schemes.

So these are two things that I think would be useful on a reasonably regular basis. I probably won’t be thinking about implementing such things any time soon though, but if anyone can think of anything that already does what I want, please tell me.

Advertisements

Create a free website or blog at WordPress.com.