(2007-10-01) Bray Parallel Log Parsing Wide Finder

Tim Bray's submission to Beautiful Code ISBN:0-596-51004-7 analyzes Web Traffic via log parsing: It's a classic example of the culture, born in AWK, perfected in Perl, of getting useful work done by combining regular expressions and hash tables.

He wanted to write an equivalent that takes advantage of the Parallel Processing potential of MultiCore machines. So he started a Wide Finder project. He starts out comparing Perl performance to a small Ruby equivalent. Then moves on to Erlang. Erlang's proponents claim that processes are free, pretty nearly. And in fact, Steve Vinoski's findings support that claim. He notes some issues with Steve's approach: he had to read the whole file into memory in one unit. Remember, my logfiles are a quarter-gig per week, and I have years' worth of data, so that's just not gonna fly. His first cut of the free-process approach got rotten performance.

Steve Vinoski has a new version that works around some of those issues.

Other people are replicating his work with other languages:

Santiago Gala contributes Python, Erlang, Map Reduce. His Python code is one line shorter than my Ruby version and, he claims, more elegant.
Bill De Hora has a single-thread version, and says he'll post a Multi Threaded version shortly.
Frederick Lundh provides a number of varieties, including MultiProcessor
Andrew Dalke adds more

Nov06: the big winner has turned out to be written in Jo Caml. With 2nd place going to Python.

Edited: 2010-09-29 00:00:00 | Tweet this! | Search Twitter for discussion

No backlinks!

No twinpages!

Bill Seitz