(2007-10-01) Bray Parallel Log Parsing Wide Finder

Tim Bray's submission to Beautiful Code ISBN:0-596-51004-7 analyzes Web Traffic via log parsing: It's a classic example of the culture, born in AWK, perfected in Perl, of getting useful work done by combining regular expressions and hash tables.

He wanted to write an equivalent that takes advantage of the Parallel Processing potential of MultiCore machines. So he started a Wide Finder project. He starts out comparing Perl performance to a small Ruby equivalent. Then moves on to Erlang. Erlang's proponents claim that processes are free, pretty nearly. And in fact, Steve Vinoski's findings support that claim. He notes some issues with Steve's approach: he had to read the whole file into memory in one unit. Remember, my logfiles are a quarter-gig per week, and I have years' worth of data, so that's just not gonna fly. His first cut of the free-process approach got rotten performance.

Other people are replicating his work with other languages:

Nov06: the big winner has turned out to be written in Jo Caml. With 2nd place going to Python.


Edited:    |       |    Search Twitter for discussion

No twinpages!