Frank Schacherer Homepage
home technology bibliography

python

[...and the holy grail]

Find out all about python at the python website.

Some useful idioms:

for line in open(file):
(my, list) = mystring.split() or mystring.split(',')
[ func(x), func2(x) for x in list if x > cond ]

Libraries

from libname import *

from libname import finc1, func2

import libname

Regexen

import re
m = re.compile(r'^16_est(\d+)').search(s) # r'foo' is noninterpolated raw string
m = re.search(r'^16_est(\d+)', s) # implicit compile
print m.group(0) # group 0 is whole match, parenthesis groups start from 1 
substituted = re.sub(pattern, repl, string[, count])
list = re.split(pattern, string)

Regex syntax is like in perl.

Data structures

Sequences (tuple, list, string)

Lists, tuples and strings are all sequences and can be accessed via slicing. Lists [] are mutable, tuples () and strings '', "" are not.

Initialize empty lists with list = [].

In slices a[x:y] indexes start from 0:

In lists list.remove(item) remove item, del list[index] remove item at index.

Other built-in functions for sequences: len(s) min(s) max(s) del s[1:3] for x in s:

Cool functions on sequences, for functional programming:

Even cooler are list comprehensions like [(x,x*2) for x in range (1,11) if x % 2 == 0]

Dictionaries (dict)

Strings

Type conversion to string: Enclose in "`" or use str(). This switches of interpretation of escaped characters when done on a string. Formated printing with print "%s ... %s" % (s1, s2). If you do not want the auto-appended newline, append a comma. raw strings (without escape interpolation with r"rawsting".

Control structures

Syntax:

Truth: empty lists, dictionaries, strings, the number zero and None (the undefined, void object) are false. Everything else is true. String comparison with ==, !=. None is smaller than anything except None. is checks for object identity (two pointers to the same object.)

Cool expressions for conditions: in checks if an item is in a list or a key in a dict.

Operators: ++ and -- are missing

Functions and Methods

Functions may not have the same name as data fields in classes, each member need a unique name, or you end up with a 'str' object is not callable error. A function definition must have been parsed before its call, so you cannot call a function that is defined later in the same file.

Parameter passing: all parameters are passed by reference. Of course immutable objects cannot be changed, so they might just as well be by value. You can assign other objects to the paramter names inside the called function without consequences. When calling methods without parameters, remember to put the parentheses behind the method: object.method(), otherwise you get the method object back, instead of calling it. Argument syntax for caller func(value) or named func(name=value), for definition def func(name) or optional args: def func(name=default) for defaults, def func(*name), def func(**name) to take rest of args into list or hash.

Names in functions have local scope, overriding globals with the same name. To use a global as such, declare it again inside the function with global theName. Variables are searched LGB (local, global, built-in). If the local fails, it looks through enclosing local scopes, too. Note that the class scope of a class inside a module is neither enclosing local, nor global, for the classes methods. Therefore, imports at class level are not seen in the methods.

A gotcha: If you only reference it, a global variable that is not locally defined is searched and found as a global, and no exceptions will be thrown. But if you later in a function assign a value to a global var, it is interpreted as a local. This will cause references to the var before that point to thow exceptions. You must declare it as global in this case. Built in names is stuff like len(), open etc.

join und other string functions can be called as methods of the string in question (better than importing the string module, i.e. string.join() or to string objects.

lambda anonymous functions may only contain a single expression. They are not real closures. (Sniff.)

Documentation

Phyton comes with built-in documentation support in the form of docstrings. the pydoc tool can be used to automatically extract this documentation. A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition. Such a docstring becomes the __doc__ special attribute. By convention, docstrings use triple quotes """. One-liners have the quotes all on the same line and end with a period. They should contain an explanation, not a restatement of the pythion code, because you can get the paramters and member names via introspection.

Modularisation

In Python there are three levels of bundling things. On the basic level, you have the class, wich bundles its methods. This class can reside in a file together with other classes, a so called module. The file, or module, is the second level of packaging. This is very much like a Package in Perl.

Modules are namespaces. Loaded modules are objects of type module, but not classes. You can access names defined in them via modulname.__dict__ or dir(). If you want to import MyClass, it is not enough to put it into a file called MyClass in the import path and use import MyClass. This will only import the module, not the class object from the module, and you'll get a 'module' object is not callable error. Instead use from MyClass import MyClass, or reference it as x = MyClass.MyClass().

Modules have additional attributes like __author__ __builtins__ __date__ __file__ __name__ .

For larger projects, putting everything in the project into one file will not do. So you create a bunch of files and put them into a common directory, each file/module representing one larger logical part of your application, and add a __init__.py file to make that directory a package. The directory, or package, is the third level of packaging.

For really large projects, you can even create hierarchies of directories/packages, with each directory holding modules pertaining to a certain logical part of your application.

For example, you start with a simple app, MyApp, all in one file. When you realize it is too big, you split it into several modules, in a MyApp directory, lets call them parser, config, viewer and engine. When you realize, each of them in turn is getting too big again, you turn it into a director, for example the parser directory contains various parsers for the various file formats.

Persistance

cPickle is the module to serialize arbitrary datastructures as ASCII text or in binary form.

shelve is a dbm-based approach that creates persistent hashes, where the values can be any python object. The pickled version of this object must abide to the limitation of the dbm system. Of course you can also use dbm or gdbm directly.

MySQLdb, which conforms to the Python DB API interface:

>>> import MySQLdb
>>> db=MySQLdb.connect(user='bioinfo',host='biserv',passwd='',db='yoh')
>>> c = db.cursor()
>>> table = "bd_method"
>>> c.execute("select * from %s" % (table))
2L
>>> c.fetchone()
('BLAST',)
>>> c.fetchone()
('manual',)
>>> c.fetchone()

Debugging, Profiling

Creating the profile file

import profile
profile.run('foo()', 'profile_filename')

Evaluating the file (best done interactively in interpreter):

import pstats
p = pstats.Stats('profile_filename')
p.strip_dirs().sort_stats('time').print_stats(10)

strip_dirs() removes the pathnames from the names.

sort_stats('time') sotes stats in decreasing order of time used by each routine. Among other possibilities are 'calls' and 'name'.

print_stats(10) will print the top ten of the sorted list. print_stats('substr') will print the stats for functions whose name contains substr. Both filters can be used ('substr', 10) and are applied in order.

Objects, Types, Classes

Namespaces: module global (including modules __builtin__ for built-in functions, and __main__ for the default module that is used when invoking the python interpreter, for example by invoking a script), function local. Scope is seareched block-local, function-local (ascending through enclosing functions), module global, builtins.

Everything is an object, even types are type objects. And objects are instances of a type. Weird circular definition. The most general type is object.

dir() on a class or module gives all it's and it's superclasses __dict__ members, as it does for a module. On an instance it shows all its and inherited variables and members. Without argument it show the local variables in scope. dir replaces the deprecated __methods__ and __members__ attributes.

Types

Python has multiple inheritance, therefore no interfaces (like in Java).

All types have a few common attributes:

There is no type checking of arguments. That means:

  1. you can pass any object that can perform the operations (has the members to call), no matter what class it is. It is generally expected in python that you do not "look before you leap" (LBYL), but practice "it's easier to ask forgivance than permission" (EAFTP), meaning you do not check the type before, but handle exceptions that are thrown when an object doesn't support the required operation
  2. The is no way to overload methods with the same number of arguments. You can only override (for example the builtins)
  3. Class-wide variables are only initialized once (when the class is loaded). To init for every instance, put initialisation into the __init__() method

tuple, list, dict, file, int, float, str, property are all built in types and can be inherited and instantiated. Some built in object types that are not directly instantiable are module, class, method, function, traceback, frame, code, builtin.

type(x) returns x's type. You can also do type(object) is type. x.__class__ is the same as type(x) for instances.

Classes

Whats the difference between types and classes? Basically, types are built in, and may represent things that cannot be instantiated, whereas classes are user (or library) defined. They are both of type 'type'. Classes can have some additional attributes:

foo = staticmethod(foo) creates a static (class)-method foo that does not need a reference to self. There's also a weired classmethod, that furnishes a reference to the calling class.

The function inspect.getmembers(object[,predicate]) returns a list of all members if the object. Everything in python is an object, like object itself, even types and classes are.

Mutable sequence objects like lists also special attributes for indexing and slicing [:], and for the operators <,>,==,!=,>=,<=,+,* as well as built in methods like len() or object specific methods like append or remove. The distribution between object methods and built in functions is unfortunately arbitrary. There is many, many more special attributes for dictionaries, strings and various number types. You can see the attributes for any object by executing

import inspect
me = inspect.getmembers(put your object here)
for n, v in m: print n,"=>",v