Vinod's Blog
Random musings from a libertarian, tech geek...
Friday, July 25, 2003 - 08:43 AM Permanent link for Robin Keir's K-9 Spam Filter
Robin Keir's K-9 Spam Filter

A coworker recently recommended Robin Keir's K-9 spam filter and BOY am I impressed with it.   It actually makes spam fighting fun!    This is a great little freeware app and I hope Robin Keir is justifiably proud of it (I also hope he had a *really* fun time putting this together).  

I'm pretty "public" with the vinod@vinod.com email addr and it's been published on the 'net for several years now.   The result of course is massive spam.   I never knew exactly how much spam I was receiving a day and just sort of put up with the nuisance.   However, as my biz trips became longer and more intense, I had less and less time to do my normal account maintenance resulting in a deluge of hundreds of messages when I got back.   

Never again.   

[Image]

K9 tells me that I receive just shy of 350 spams a day and around 30 legitimate messages (a pathetic 10% signal to noise ratio) on this account. 

The product works by using Bayesian filtering techniques to dynamically evalute Spam vs. Good email.   It builds up a dictionary of words / alphanumeric patterns and their frequencies in the respective types of messages.  Then, using this statistical index it looks at your inbound mail to make a spam determination.  

One downside of this technique is that there is a definite learning period where you have to tell K9 which messages are Spam or not but the curve is remarkably short.   I think I had to expressly tag ~10-20 messages before it started generating >>90% accuracy.   The rulebase is iteratively evaluated everytime you receive a message so the filter gets progressively more accurate.

Another downside is that there is some risk of false positives (marking Good email as Spam) as well as false negatives (marking Spam as Good).   In the week I've been running K9, I've had about 10 instances of false positives and zero false negatives.   So this is NOT a fire and forget tool.   But if you're of the geek inclination, it's sorta fun to watch K9 build up & execute upon its rules database.

In order to get access to your email stream, K9 sets itself up as an SMTP proxy -- you configure outlook / outlook express to point at K9 for email retrieval and in turn K9 points at your actual email server to perform the download.   As email streams through the proxy, K9 will add a "[spam]" tag to email it believes is spam and send it on to your client.    You then configure Outlook to filter on this subject line and toss into a spam folder as appropriate.

There are other products similar to k9  -- it specifically draws algorithmic inspiration from Popfile -- but I prefer K9's windows client centricity.  Popfile, for ex., places all of it's UI inside of a web browser which is a pain in the butt for the quick / fast twitch stat checks.   K9 is just all around more polished.   Great work!


Permanent link for Robin Keir's K-9 Spam Filter   Comments [ ] :: Main :: Archives