| WebSeitz/wikilog |
| z2007-10-01- Bray Parallel Log Parsing Wide Finder |
|
| last edited by BillSeitz on Aug 22, 2008 6:20 am |
TimBray's submission to [Beautiful Code] ISBN:0-596-51004-7 analyzes Web Traffic via log parsing: It's a classic example of the culture, born in Awk, perfected in Perl, of getting useful work done by combining regular expressions and hash tables.
He wanted to write an equivalent that takes advantage of the Parallel Processing potential of Multi Core machines. So he started a [Wide Finder] project. He starts out comparing PeRl performance to a small RuBy equivalent. Then moves on to ErLang. Erlang's proponents claim that processes are free, pretty nearly. And in fact, Steve Vinoski's findings support that claim. He notes some issues with Steve's approach: he had to read the whole file into memory in one unit. Remember, my logfiles are a quarter-gig per week, and I have years' worth of data, so that's just not gonna fly. His first cut of the free-process approach got rotten performance.
Steve Vinoski has a new version that works around some of those issues.
Other people are replicating his work with other languages:
[Santiago Gala] contributes PyThon, ErLang, Map Reduce. His Python code is one line shorter than my Ruby version and, he claims, more elegant.
Bill De Hora has a single-thread version, and says he'll post a Multi Threaded version shortly.
[Frederick Lundh] provides a number of varieties, including Multi Processor
Nov06: the big winner has turned out to be written in [JoCaml]. With 2nd place going to PyThon.
| User Options Recent Changes Help Page |