Error message

  • Notice: Trying to get property of non-object in filter_default_format() (line 532 of /home/ntroutman/webapps/nt_drupal/modules/filter/filter.module).
  • Notice: Undefined variable: options in filter_process_format() (line 911 of /home/ntroutman/webapps/nt_drupal/modules/filter/filter.module).

Another Reason To Love Decorators: Pickled Functions

The more I work with python the cooler it becomes, especially the more I let my mind think of way out there things to do with the tools python provides. In my work I frequently find my self writing code in this order:

  1. Data parsing
  2. Reformatting
  3. Complex transformation A of the data
  4. Complex transformation B of transformation A
  5. Use transformation B to do another complex task

So my script starts empty, then my main method calls my parsing and reformatting code and saves out the results. Then I clear the main method and start working on steps 3-5 which generally comprise the rest of the main method. Now this means when ever i run my code it re-runs steps 3-5 everytime, this is okay if they are fast, but a pain if you make a change in step 4 and have to wait for step 3 to finish before you can move onto step 5. And worse if you are making changes to step 5 and have to keep waiting for steps 3 and 4 to re run with every change.

Now you could say that I should be saving the results of steps 3 and 4 and just loading them from the disk which speeds things up. But now if I change something in step 4 I have to manually rerun and resave the results. That gets annoying, so does rewriting all the code to keep saving things out to a file every project I work on. So as a solution, enter the "pickled" decorator:

def main():
    data = parseAndLoadData()
    transformed_data = doComplexTransformation(data)
    transformed_data = doAnotherComplexTransformation(data)

def parseAndLoadData():
    # parse and load the data
    return data

def doComplexTransformation(data):
    # do some nasty complex transformation that takes a long time
    return trasnformation

def doAnotherComplexTransformation(data):
    # do some other nasty complex transformation that takes a long time
    return trasnformation

Now the first time each function "parseAndLoadData" and "doComplexTransformation" are called they get run like normal and the result is saved out to a pickle file. The second call to them when we run the script again after changing "doCoolThing" the results will be loaded from the pickle file. Now here is the cool part, If you go back and change the code of "doComplexTransformation" the function will be called again and the old pickled result replaced with the new result! And the coolness doesn't stop there, since "doAnotherComplexTransformation" depends on "doComplexTransformation" and changes to "doComplexTransformation" will cause "doAnotherComplexTransformation" to be rerun as well.

Here is the magic:

from decorator import decorator
import cPickle as pickle
import os
import types

def pickled(*args, **kw):
    """Allows the results of a function to be pickeled and reloaded in order to
    save computation time. If the results of a function depend on another function
    then this decorator can be applied using "@pickled(depends=function_list)"
    Since this code depends on the decorator module, in order for dependencies
    to work any custom decorators must use the decorator module as well.
    simple_decorator = False
    # get the functions this depends on and make sure its a tuple
    depends = kw.get('depends', None)
    if depends is not None and type(depends) is not types.TupleType:
        depends = (depends,)
        # since we may be working on decorated functions we need access to the
        # undecorated function in order to create the correct hash of the code
        depends = map(lambda f: getattr(f, 'undecorated', f), depends)
    if len(args) > 0 and type(args[0]) is types.FunctionType:
        simple_decorator = True      
    def _pickled(func, *args, **kw):
        # convert the dependencies into a list of hashes
        depends_str = ''
        if depends is not None:
            depends_str = '*'.join((str(hash(d.func_code)) for d in depends))
        fstart = '%s_func=%s__' % (os.path.basename(__file__), func.__name__)
        fname = '%shash=%d__deps=%s.pkl' % (fstart, hash(func.func_code), depends_str)
        # create a place to save the functions
        if not os.path.exists('pickled_functions'):
        # look through the files in the current directory
        FILE = None
        for f in os.listdir('pickled_functions'):
            # if we find a file matching the pickle name then open it
            # if just the start of the file's name matches its an old version
            # so remove it to keep the directory clean
            if f == fname:
                FILE = open('pickled_functions/%s' % fname, 'rb')
            elif f.startswith(fstart) and f.endswith('.pkl'):
                print 'removing:', f
                os.remove('pickled_functions/%s' % f)
        # if we found a matching file load the pickle
        # other wise call the function and save out the results
        if FILE is not None:
            print 'loading:', fname
            result = pickle.load(FILE)
            result = func(*args, **kw)
            print 'writing:', fname
            pickle.dump(result, open('pickled_functions/%s' % fname, 'wb'), pickle.HIGHEST_PROTOCOL)
        return result
    if simple_decorator:
        # called as a plain decorator "@pickled"
        return decorator(_pickled, args[0])
        # called as a decorator with depends "@pickled(depends=someFunc)"       
        return decorator(_pickled)

Add new comment