(2002-08-21) a

Paul Graham suggests filtering spam EMail based on statistical analysis of its content. Merely looking for the word "click" will catch 79.7% of the emails in my spam corpus, with only 1.2% false positives... In fact, "ff0000" (html for bright red) turns out to be as good an indicator of spam as any pornographic term.

Steve Gillmor says that Jon Udell uses a really simple filter: Any message that doesn't name Jon in the "To" or "CC" field is shipped to a folder for later perusal. Jon reports this consumes about 97 percent of the offending material. (It doesn't report on how many false matches it generates, which is a key factor for avoiding having to even look in the spam folder.)


Edited:    |       |    Search Twitter for discussion

No twinpages!