Category: Python

09/23/07

Permalink 03:23:43 pm, Categories: Movies, Programming, Python, Web, 471 words   English (US)

My first Django app: Movie Vote

Occasionally I show movies for team-building and just plain taking some stress out of the week. I'm an avid movie watcher and have tons. So what I usually do is send out a proposed list of 5 or so movies and let people vote on them. I was using a PHP script called LittlePoll. If that link is broken, try this one. Anyway, it did the job, however I got really tired of picking out 5 movies to show. Wouldn't it be better if I just listed all of my movies and let people vote on them? Sort of like a Digg/Reddit interface to a movie list. Also I had no form of authentication and relied on the honor system (and I found out early not too many viewers deserved such honor).

I figured this would be a good learning experience to do some web programming in Python. So I sat down and wrote one using Django. I picked Django as we use it a lot at work and it seemed like a good thing to learn. In general I like it, however I feel all of these SQL abstraction frameworks just make it more difficult to get the queries you want in the most optimized fashion. In fact I had difficulty trying to do a simple task via the SQL abstraction. You'll notice that thread is gathering cobwebs due to its silence. The end result? Screw the abstraction and use raw SQL, sigh. I feel these web frameworks concentrate too much on getting things up as fast as possible more than functionality.

Other than that Django is pretty nifty to work with. It is indeed very simple. In no time I had a SQLite database-backed site with authentication setup. We even built in a weighting system so that people who consistently show up for the movies get a higher weight applied to their vote. This was a problem because we'd have users voting but never showing up, thereby affecting the viewing for others. Now new users get a zero weight until they attend one screening. It seems to be working well.

I say we because a co-worker has helped with some of the coding. The main problem we currently have is the slowness due to it running on a very old Thinkpad laptop. But we've optimized it pretty well. In fact, working on such low end hardware really teaches you to do things as efficient as possible. I know if I were to move it to a high end server it would fly. Oh and there is the problem with UI design, which I am horrible at. It is simple and functional though.

Now we have a pretty good system for deciding what to watch at our movie showings. I encourage you to setup something similar and give me feedback/patches :).

02/10/07

Permalink 06:29:58 pm, Categories: Python, 863 words   English (US)

Threads and fork: a bad idea

I work on an application at work that does all sort of maintenance on large clusters. Without going into too much detail, it generally keeps an eye on our servers, fixes problems, and performs setup and maintenance. It's a rather important system, and it's all written in Python.

Now for the past few months we've been trying to debug an issue. At random times it would segfault. We had lots of people looking at the core dumps and it seemed to be some sort of memory corruption. We ruled out the machine, as it would crash on multiple machines.

So where did I start? The program is multithreaded, so it usually has lots of processes listed via 'ps'. On a whim (and most of my debugging was based on whims), I did straces on every process. When I did this, I observed the Heisenberg principle, the program did not crash. Running straces on the application wasn't really a solution so I had to look more into it.

I'm fairly involved with the development of this system, so I understand alot of the code. I did not understand why it would crash. I did know one thing: the application uses threads, and it also forks. But that didn't help me because I had no idea there was a problem mixing the two.

I decide to run a tcpdump to see if any strange network traffic may be causing it. After much analysis with ethereal, I noticed something strange. Crashes always seemed to coincide with a web request to the application. The app runs a web interface, and we have monitoring hitting this UI to make sure it is running properly. So it turned out, the 'random' crashes were not so random, and were happening when our monitoring system was hitting the app.

But why was it crashing? With any multithreaded app, you need to have proper locking of shared variables. You generally have to worry about multiple threads modifying/accessing the same area of memory at the same time. So I spent weeks analyzing the code, trying different locks in different areas, etc. I made our webserving function simply output nothing, so it didn't access any shared data. To my surprise, we still crashed.

One problem was I could not predict when it would crash. It could run up to 12 hours without crashing, and most of the monitoring hits of the UI worked fine. So I had work on replicating the bug more frequently. After alot of work using a live debugger that a co-worker built into the app, I was able to set off a sequence of events that crashed it within 2-3 minutes. It generally involved slamming the UI with parallel web requests, and kicking off some internal methods that cause a fork() (for example, SSH'ing to a machine to verify connectivity). Ok that's better for testing. But it told me something: the crash seemed to be involved with forking.

Our webserver in this app was using Python's generic BaseHTTPServer with a ThreadingMixin to handle requests with new threads. From parallel analysis of the core files by others, it did appear to be crashing on pthread create functions. Out on another whim, I changed ThreadingMixin to ForkingMixin, so the webserver would fork instead. Voila, no crash! So this told me it has something to do with forking and threading.

A co-worker pointed out this GNU C Library page warning about using threads and fork. The clincher was this sentence:

Because threads are not inherited across fork, issues arise. At the time of the call to fork, threads in the parent process other than the one calling fork may have been executing critical regions of code. As a result, the child process may get a copy of objects that are not in a well-defined state. This potential problem affects all components of the program.

So indeed we were doing something bad. The problem with changing our webserver to ForkingMixin is that removes any possibility of IPC, which we needed via our UI. i.e. A web request couldn't change the state of the program unless we created some specific IPC mechanism. So instead, we created a parent thread which started one BaseHTTPServer without any mixin. This means the requests would be processed serially, but would still have the ability to update shared data. The only issue was low throughput due to serially handled requests (and also, one request at a time). For our app, it sufficed because the number of UI requests we get is low, and mainly it's for monitoring.

This fixed our immediate issue, but I have now seen crashes still occurring, but at a much later time. The problem is that we are still using threads. It's not just in our webserving, but in other parts of the app as well. The only solution it seems is to remove such threading. We must fork because we do things like SSH.

What's the lesson here? Do not, I repeat, do not ever design an application that used threads and also forks. You will have no end of trouble. If using threads, stick with threads only. If using fork, stick with forks only.

11/04/06

Permalink 08:57:15 pm, Categories: Python, 259 words   English (US)

trying to close a file

I'm reading 'Core Python Programming 2nd Edition' and came across what I believe is an error in an example. Here is what I sent the author:

380-381 :: 10.3.10 : open() can throw IOError

I have some general comments on this section. You describe two ways of using try-finally-except. The first method was:

try:
  try:
    ccfile = open('carddata.txt ', 'r')
    txns = ccfile.readlines()
  except IOError:
    log.write('no txns this month\n')
finally:
  ccfile.close()

But if open() fails, an IOError will be thrown, and ccfile will be undefined. The code in finally: will attempt to close that undefined variable, and throw a NameError.

The second method was:

try:
  try:
    ccfile = open('carddata.txt', 'r')
    txns = ccfile.readlines()
  finally:
    ccfile.close()
except:
  log.write('no txns this month\n')

But this suffers from the same problem. If open() fails, you attempt to close an undefined variable.

-- EOM

What's the better way to do this?

try:
  ccfile = open('carddata.txt', 'r')
except IOError:
  log.write('failed to open file\n')
else:
  try:
    try:
      txns = ccfile.readlines()
    except IOError:
      log.write('no txns this month\n')
  finally:
    ccfile.close()

Yuck.. anything better? In Python 2.5, you can clean this a bit:

try:
  ccfile = open('carddata.txt', 'r')
except IOError:
  log.write('failed to open file\n')
else:
  try:
    txns = ccfile.readlines()
  except IOError:
    log.write('no txns this month\n')
  finally:
    ccfile.close()

Still yucky though... The author of the book acknowledged the errata and credited me. He also gives a better solution, setting

ccfile = None

before the block and an

if ccfile:

test before closing.

10/25/06

Permalink 07:29:25 am, Categories: Python, 175 words   English (US)

dictionary fun

Learned some new things about Python dictionaries. Let's say you wanted to create a dictionary with keys given in a list, and a default value for all of them, say None.

>>> somelist=['foo', 'bar', 'baz']
>>> {}.fromkeys(somelist)
{'baz': None, 'foo': None, 'bar': None}

Or some other value, like True:

>>> {}.fromkeys(somelist, True)
{'baz': True, 'foo': True, 'bar': True}

An uglier way to do this is with a list comprehension:

>>> dict([(x, True) for x in somelist])
{'baz': True, 'foo': True, 'bar': True}

Now you can easily see a way to uniq-ize a list:

>>> somelist=['foo', 'bar', 'foo']
>>> {}.fromkeys(somelist).keys()
['foo', 'bar']

But really, there is an even better way to uniq-ize a list in 2.4:

>>> list(set(somelist))
['foo', 'bar']

On another note, let's say I wanted to loop through the keys of a dict in sorted order in Python 2.4:

>>> somelist=['foo', 'bar', 'baz']
>>> somedict={}.fromkeys(somelist)
>>> for key in sorted(somedict):
...   print key
... 
bar
baz
foo

Or, did you know you can create a dict with x=y arguments?

>>> dict(x=1,y=2)
{'y': 2, 'x': 1}

10/22/06

Permalink 02:32:35 pm, Categories: Python, 112 words   English (US)

reversing a list

I've always found the reverse() method of a list rather wierd because it reverses the list in place. I just found out about reversing via slicing:

>>> f=['foo', 'bar', 'baz']
>>> f[::-1]
['baz', 'bar', 'foo']
>>> f='hello'
>>> f[::-1]
'olleh'

There is also a new builtin function in 2.4 called reversed(), which returns an iterator, and is more efficient for large lists:

>>> f=['foo', 'bar', 'baz']
>>> for x in reversed(f):
...   print x
... 
baz
bar
foo

And yet another (ugly) method shown to me by a coworker:

>>> f=['foo', 'bar', 'baz']
>>> reduce(lambda x,y: [y] + ((type(x) == type([]))
...        and x or [x]), f)
['baz', 'bar', 'foo']

Hmm, Python seems to be moving towards TMTOWTDI.

10/21/06

Permalink 04:14:57 pm, Categories: Books, Python, 123 words   English (US)

enumerate

I'm reading Core Python Programming 2nd Ed. I sometimes like reading the basic intro chapters even though I have a bit of familiarity with Python. There is always something that I may have missed, and each author has a different writing style which is entertaining to read.

I just found out about the enumerate() builtin. How many times have you done this?

>>> f=['foo', 'bar', 'baz']
>>> for i in range(len(f)):
...   print f[i]
... 
foo
bar
baz

Well you can use enumerate() to have both the index and the value:

>>> f=['foo','bar','baz']
>>> for i, val in enumerate(f):
...   print i, val
... 
0 foo
1 bar
2 baz

Cleaner and no array indexing needed. Oh and how about the Zen of Python?

>>> import this

:: Next Page >>

Viraj's Weblog

This is my personal blog. The views expressed on these pages are mine alone and not those of my employer.

| Next >

August 2010
Mon Tue Wed Thu Fri Sat Sun
<<  <   >  >>
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31          

Search

Categories


Misc

Syndicate this blog XML

What is RSS?

powered by
b2evolution