The past week I've been learning about Clojure, a really neat Lisp language with easy Java interoperability. I've always been a bit turned off by Lisp simply due do it being so esoteric and difficult to find useful libraries to do practical work. Clojure gives you easy access to Java classes so it's much more appealing to me. There is pretty much a Java library out there to do anything you want. After experimenting with it, it makes writing Java code as easy as writing Python.
I wrote a system for doing movie voting, and part of that includes some Python scripts for determining review scores on Rotten Tomatoes and Metacritic. These seemed like great examples to try in Clojure. Clojure's documentation is rather terse, and it took me awhile to put all the pieces together. There are 2 scripts I made in Clojure: one that determines the Rotten Tomatoes or Metacritic URLs for a movie, and another that parses those HTML pages to determine the rating.
Here is the code to find the URLs using Google's Ajax API. Notice how ridiculously small this script is. Even using such libraries in Python results in much more code. What I especially liked is the arrow operator and how simple it was to parse JSON.
Here is the code to parse the HTML and determine the score/rating. What is amazing here is interacting with a Java htmlparser class. That class requires passing in an implementation of a NodeVisitor interface. So when each tag of an HTML page is visited, the NodeVisitor.visitTag is used. Clojure let's you define a proxy, which is your implementation of the interface in Clojure. Look again at how simple it is to traverse the HTML for the right tag.
The latter code also demonstrates the use of Clojure's run-time polymorphism. The method fetch-score is a multimethod that dispatches to the proper function depending on whether the argument url has 'rottentomatoes' or 'metacritic' within it. You can define arbitrary functions that determine how to dispatch to other functions. In this case it's simply based on a regex.
The code is missing some error handling, but in most cases an error results in an empty list that I don't really care about. You can also catch the proper exceptions if needed. The main thing I like about this is how short the code is. It takes awhile to understand Lisp, but it almost feels natural in some ways. Lisp seems to encourage minimal building-block functions and putting them all together, almost like putting together small Unix programs with pipes and filters. It also seems like there is a million ways to do something.
It's been almost a year since I released a version of Movie Vote. In that time I had been rewriting the UI in Java with Google Web Toolkit off and on. That started out as an experiment because I hated Javascript so much. I liked GWT so much I went full-on with it. Now the server side of my app is mostly JSON interfaces, and the UI is AJAXy without any Javascript written by me. I was glad to 'svn rm' the horrible Javascript I had in my previous versions. With GWT and Java, type checking is done before I deploy and I no longer have to worry about variable misspellings and other stupidness when writing JS.
Anyway, just in time for the holidays, I give you Movie Vote v1.5.
In general, web programming is really not my thing. No matter what language I'm writing a UI in, I'm just not a UI developer. I can guess as to what I think is a cool interface but this is better left to UI designers. I now have a screenshot on the project page. That minimal-looking interface involved a lot of code!
Excellent article (Part 1 of many) on RAM.
Occasionally I show movies for team-building and just plain taking some stress out of the week. I'm an avid movie watcher and have tons. So what I usually do is send out a proposed list of 5 or so movies and let people vote on them. I was using a PHP script called LittlePoll. If that link is broken, try this one. Anyway, it did the job, however I got really tired of picking out 5 movies to show. Wouldn't it be better if I just listed all of my movies and let people vote on them? Sort of like a Digg/Reddit interface to a movie list. Also I had no form of authentication and relied on the honor system (and I found out early not too many viewers deserved such honor).
I figured this would be a good learning experience to do some web programming in Python. So I sat down and wrote one using Django. I picked Django as we use it a lot at work and it seemed like a good thing to learn. In general I like it, however I feel all of these SQL abstraction frameworks just make it more difficult to get the queries you want in the most optimized fashion. In fact I had difficulty trying to do a simple task via the SQL abstraction. You'll notice that thread is gathering cobwebs due to its silence. The end result? Screw the abstraction and use raw SQL, sigh. I feel these web frameworks concentrate too much on getting things up as fast as possible more than functionality.
Other than that Django is pretty nifty to work with. It is indeed very simple. In no time I had a SQLite database-backed site with authentication setup. We even built in a weighting system so that people who consistently show up for the movies get a higher weight applied to their vote. This was a problem because we'd have users voting but never showing up, thereby affecting the viewing for others. Now new users get a zero weight until they attend one screening. It seems to be working well.
I say we because a co-worker has helped with some of the coding. The main problem we currently have is the slowness due to it running on a very old Thinkpad laptop. But we've optimized it pretty well. In fact, working on such low end hardware really teaches you to do things as efficient as possible. I know if I were to move it to a high end server it would fly. Oh and there is the problem with UI design, which I am horrible at. It is simple and functional though.
Now we have a pretty good system for deciding what to watch at our movie showings. I encourage you to setup something similar and give me feedback/patches
.
I work on an application at work that does all sort of maintenance on large clusters. Without going into too much detail, it generally keeps an eye on our servers, fixes problems, and performs setup and maintenance. It's a rather important system, and it's all written in Python.
Now for the past few months we've been trying to debug an issue. At random times it would segfault. We had lots of people looking at the core dumps and it seemed to be some sort of memory corruption. We ruled out the machine, as it would crash on multiple machines.
So where did I start? The program is multithreaded, so it usually has lots of processes listed via 'ps'. On a whim (and most of my debugging was based on whims), I did straces on every process. When I did this, I observed the Heisenberg principle, the program did not crash. Running straces on the application wasn't really a solution so I had to look more into it.
I'm fairly involved with the development of this system, so I understand alot of the code. I did not understand why it would crash. I did know one thing: the application uses threads, and it also forks. But that didn't help me because I had no idea there was a problem mixing the two.
I decide to run a tcpdump to see if any strange network traffic may be causing it. After much analysis with ethereal, I noticed something strange. Crashes always seemed to coincide with a web request to the application. The app runs a web interface, and we have monitoring hitting this UI to make sure it is running properly. So it turned out, the 'random' crashes were not so random, and were happening when our monitoring system was hitting the app.
But why was it crashing? With any multithreaded app, you need to have proper locking of shared variables. You generally have to worry about multiple threads modifying/accessing the same area of memory at the same time. So I spent weeks analyzing the code, trying different locks in different areas, etc. I made our webserving function simply output nothing, so it didn't access any shared data. To my surprise, we still crashed.
One problem was I could not predict when it would crash. It could run up to 12 hours without crashing, and most of the monitoring hits of the UI worked fine. So I had work on replicating the bug more frequently. After alot of work using a live debugger that a co-worker built into the app, I was able to set off a sequence of events that crashed it within 2-3 minutes. It generally involved slamming the UI with parallel web requests, and kicking off some internal methods that cause a fork() (for example, SSH'ing to a machine to verify connectivity). Ok that's better for testing. But it told me something: the crash seemed to be involved with forking.
Our webserver in this app was using Python's generic BaseHTTPServer with a ThreadingMixin to handle requests with new threads. From parallel analysis of the core files by others, it did appear to be crashing on pthread create functions. Out on another whim, I changed ThreadingMixin to ForkingMixin, so the webserver would fork instead. Voila, no crash! So this told me it has something to do with forking and threading.
A co-worker pointed out this GNU C Library page warning about using threads and fork. The clincher was this sentence:
Because threads are not inherited across fork, issues arise. At the time of the call to fork, threads in the parent process other than the one calling fork may have been executing critical regions of code. As a result, the child process may get a copy of objects that are not in a well-defined state. This potential problem affects all components of the program.
So indeed we were doing something bad. The problem with changing our webserver to ForkingMixin is that removes any possibility of IPC, which we needed via our UI. i.e. A web request couldn't change the state of the program unless we created some specific IPC mechanism. So instead, we created a parent thread which started one BaseHTTPServer without any mixin. This means the requests would be processed serially, but would still have the ability to update shared data. The only issue was low throughput due to serially handled requests (and also, one request at a time). For our app, it sufficed because the number of UI requests we get is low, and mainly it's for monitoring.
This fixed our immediate issue, but I have now seen crashes still occurring, but at a much later time. The problem is that we are still using threads. It's not just in our webserving, but in other parts of the app as well. The only solution it seems is to remove such threading. We must fork because we do things like SSH.
What's the lesson here? Do not, I repeat, do not ever design an application that used threads and also forks. You will have no end of trouble. If using threads, stick with threads only. If using fork, stick with forks only.
I'm reading 'Core Python Programming 2nd Edition' and came across what I believe is an error in an example. Here is what I sent the author:
380-381 :: 10.3.10 : open() can throw IOError
I have some general comments on this section. You describe two ways of using try-finally-except. The first method was:
try:
try:
ccfile = open('carddata.txt ', 'r')
txns = ccfile.readlines()
except IOError:
log.write('no txns this month\n')
finally:
ccfile.close()
But if open() fails, an IOError will be thrown, and ccfile will be undefined. The code in finally: will attempt to close that undefined variable, and throw a NameError.
The second method was:
try:
try:
ccfile = open('carddata.txt', 'r')
txns = ccfile.readlines()
finally:
ccfile.close()
except:
log.write('no txns this month\n')
But this suffers from the same problem. If open() fails, you attempt to close an undefined variable.
-- EOM
What's the better way to do this?
try:
ccfile = open('carddata.txt', 'r')
except IOError:
log.write('failed to open file\n')
else:
try:
try:
txns = ccfile.readlines()
except IOError:
log.write('no txns this month\n')
finally:
ccfile.close()
Yuck.. anything better? In Python 2.5, you can clean this a bit:
try:
ccfile = open('carddata.txt', 'r')
except IOError:
log.write('failed to open file\n')
else:
try:
txns = ccfile.readlines()
except IOError:
log.write('no txns this month\n')
finally:
ccfile.close()
Still yucky though... The author of the book acknowledged the errata and credited me. He also gives a better solution, setting
ccfile = None
before the block and an
if ccfile:
test before closing.
:: Next Page >>
This is my personal blog. The views expressed on these pages are mine alone and not those of my employer.
| Next >
| Mon | Tue | Wed | Thu | Fri | Sat | Sun |
|---|---|---|---|---|---|---|
| << < | > >> | |||||
| 1 | ||||||
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| 16 | 17 | 18 | 19 | 20 | 21 | 22 |
| 23 | 24 | 25 | 26 | 27 | 28 | 29 |
| 30 | 31 | |||||