Loose Wire — Exorcism for Spam: A theory devised by an English vicar and adopted by smart anti-spammers is your best bet for keeping spam out of your inbox
By Jeremy Wagstaff
A milestone, of sorts, was passed last month. According to MessageLabs, a United States-based company that studies these things, the Internet for the first time handled more spam e-mail messages than normal e-mails. In other words, for every legitimate e-mail sent, there was at least one spam, or unsolicited junk e-mail, sent. Compare that with a year ago when the ratio was about one spam for every 20 e-mails. A year before that? One in 1,500. Spam was never pretty, but it’s getting ugly, and something has to give. But what?
Spam is a business, and understanding that is halfway to embracing a solution that works. Why, for example, does MessageLabs spend so much time counting spam? Because it sells services and software that help companies avoid it. In fact, spam is, I suspect, much more profitable for the folk who clean it up than the guys who put it out. Think about it: It costs a spammer very little to send one e-mail, and only one in 10 million to generate a sale to stay in business, but God knows how much in lost man-hours for you or I to receive it, open it, read it, feel slightly nauseous, discard it and then wander over to the water cooler to complain to colleagues about it. There are conflicts of interest here that make me slightly uncomfortable advising you to buy products to keep out what shouldn’t be in your inbox anyway.
So here’s my solution: It’s simple, costs you nothing and will improve as you get more spam. Most anti-spam software looks for things it recognizes as spam-like: words like “Viagra,” for example, and filters it out. But this isn’t always that effective — replace “i” with “1” and you have v1agra, or add some invisible formatting code in the middle of the word, so the word looks the same to a reader, but different to a spam filter. So as spammers get more cunning, filters have to get smarter. This is why using logic, rather than keywords, makes sense. Enter an 18th-century vicar called Thomas Bayes from the English town of Tunbridge Wells. He devised a probability theory that has become a useful tool in gauging whether e-mail is spam or not.
Briefly, Bayesian filters look at the content of e-mail (including the headers, in most cases, and the hidden code in e-mails, called HTML, that organizes fonts, colours and pictures), slices it into bits — words and chunks of code — and judges the probability of each bit being evidence of spam. It will then scrutinize the 15 most interesting bits and add up their probabilities (0.99, for example, meaning 99% likely it’s spam) and then cast judgment on the e-mail. The more you prod it along — yes, this one is spam; no, this one looks like spam but is actually my Auntie Edith suggesting I have plastic surgery — the better it gets. And of course the more e-mail you get, the more it has to play with. Bayesian filters don’t just look for matches, they look for patterns of behaviour that give spam away.
For starters, try POPFile which will work on most operating systems and with most e-mail programs. If you’re squeamish about manual tweaking, check out Spammunition for Outlook or SpamBully for Outlook or Outlook Express ($30 from www.spambully.com).
On top of that, try a trick of my own: Ask colleagues or friends to assign agreed tags to subject lines and set up your e-mail program to recognize those tags and filter them into special folders. [Meet] for example, could be used to relate to meetings, [Budget] for stuff related to how much money you plan to waste that year and [Fire] for e-mails alerting staff they’re being downsized. Such e-mails would then leap past any filters and be easy to search for. Spam’s not going to go away soon, but with good filters you need never see it in your inbox again. Or go to the water cooler.