Could Social Clustering Be Used To Kill Off Spam?

By | February 23, 2004

We can relax: Boffins are now grappling with spam.

Nature reports that P. Oscar Boykin and Vwani Roychowdhury of the University of California, Los Angeles, have come up with a way to tackle at least half the emails we get, namely those we get from friends, colleagues, and anyone else either we know or the people we know know (I’ve always wanted to write that sentence).

It works like this: If Alice knows and e-mails Bob and Chris, for example, then Bob and Chris are far more likely to know and e-mail each other than if they didn’t share a friend in common. E-mails radiating from a spam source don’t share this clustering property – the vast majority of recipients don’t know each other. “The method,” Nature says, “effectively turns the spammers’ weapon on themselves. The very fact that they can send out so many messages secures their low overall degree of clustering – it’s what gives them away.”

This is all done by inspecting the ‘from’, ‘to’ and ‘cc’ fields in a user’s inbox. An automated system can quickly build up a blacklist of spammers, as well as a ‘whitelist’ of approved sources. E-mails above a certain ‘clustering threshold’ are always friendly, and those below a lower threshold are always spam.

Boykin and Roychowdhury acknowledge this may only apply to about 50% of email. But those would have been filtered without any errors, and it would have required no user intervention at all. The remaining e-mail would have to be filtered by other means, but as the authors say, ”our algorithm may be used as a platform for a comprehensive solution to the spam problem when used in concert with more sophisticated, but more cumbersome, content-based filters.”

It’s not a bad idea at all. By looking at header fields rather than content the filtering process would be much quicker. Furthermore, the only way I could see the spammers getting around it would be to spoof header fields so they somehow anticipated the social clusters of the recipients: In other words, they’d have to try to figure out who was on someone’s white list for their message to get through. (Although I suppose spoofing the actual recipient’s email address as the sender field might be enough.)

What’s intriguing is how this might feed into social networks like Friendster. Could these groups be mobilised as automatic whitelists for users, so that, for example, I could, with a mouse click, ensure that everyone on my Friendster list is automatically on my whitelist? If this sort of thing caught on, it might give an added incentive to join such networks for folk like me who find places like Friendster a bit too, er, youthful and places like LinkedIn a bit too, er, business-oriented.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.