Every now and then I come across some really nifty Python code. Terseness is really not a good virtue of Python code, but I just thought this snippet was very clever. It comes from a really nice online book called Text Processing in Python. The problem deals with flushing a code block to the left, keeping all other indentation the same. In other words, this:
if blah:
print 'hi'
gets changed to:
if blah: print 'hi'
Simple enough huh? Here is their solution:
from re import findall,sub
# What is the minimum line indentation of a block?
indent = lambda s: reduce(min,map(len,
findall('(?m)^ *(?=\S)',s)))
# Remove the block-minimum indentation from each line?
flush_left = lambda s: sub('(?m)^ {%d}' % indent(s),'',s)
lambdas are small one-line functions in Python. This text talks alot about 'high order functions' which are basically function generators, and they make extensive use of lambdas. The above creates 2 functions stored in the variables indent and flush_left. flush_left would be called on the text to be processed.
First lets look at 'indent'. Essentially this function is trying to determine the minumum line indentation of a block to remove. It would result in 3 for my example. It's kinda confusing unless you break it down. findall is the innermost function, and it finds all regular expression matches for:
(?m)^ *(?=\S)
The '(?m)' is adding the multiline option to this regexp, so that it would span multiple lines. Then the regexp looks for 1 or more spaces at the beginning of a line given a lookahead assertion that there is a non-whitespace character right after the last space. Whew. So this will return a bunch of strings with some number of spaces.
Next is the call to map(). This function applies a certain function (in this case len()) to a list and returns that list. So the result of the map() call is a list of numbers which are the lengths of leftmost whitespace.
The next call is reduce(), which applies a certain function to pairs of elements in a list, returning one result. Essentially it reduces. In this case it is applying min(), which returns the minimum value. So the final result is the smallest number of leftmost spaces. This whole operation is assigned a lambda function called indent() that takes one argument, the block of code being analyzed.
Next is the flush_left() function:
flush_left = lambda s: sub('(?m)^ {%d}' %
indent(s),'',s)
What does this do? Well it does a regexp substitution, replacing '%d' number of spaces at the beginning of a line with nothing. %d is calculated by the indent() function defined above, which is the minimum number of spaces to remove in order to flush everything to the left.
Pretty nifty, but hard to understand at first glance. I'm not sure if I'd ever write such code, but it is fun to analyze. Obviously this wouldn't work with tab-indentation, but a few modifications and it would.
No Comments/Pingbacks for this post yet...
An ERROR has occured!
Here you might send email-notification to webmaster or something like that.