McAfee Comes Late To Rev. Bayes’ Party

McAfee seems to have come somewhat late to the spam party: Network Associates, Inc. , ‘the leader in intrusion prevention solutions’, today announced that it has incorporated “powerful new Bayesian filtering into the latest McAfee SpamAssassin engine”. What, only now?

Bayesian filtering is a pretty powerful weapon in the war against spam. I use POPFile and K9 and would recommend either, not least because they’re free. But why has it taken so long for McAfee to get around to including it in their SpamAssassin product?

To be fair, the McAfee Bayesian filter is “fully automated in its learning abilities, whereas other competitive solutions require manual training by users or systems administrators”. That is an improvement, but I wonder how well it works.

SpamKiller/Assassin also includes some other features, including Integrity Analysis, which applies algorithms to determine if the email is spam, Heuristic Detection, Content Filtering, Black and White Lists and DNS-Blocklist Support.

    If a word, “Cartesian” for example, never appears in spam but often in your legitimate mail, the probability of “Cartesian” indicating spam is near zero. “Toner”, on the other hand, appears exclusively, and often, in spam. “Toner” has a very high probability of being found in spam, not much below 1 (100%).

    When a new message arrives, it is analyzed by the Bayesian spam filter, and the probability of the complete message being spam is calculated using the individual characteristics.

    Let’s say a message contains both “Cartesian” and “toner”. From these words alone it’s not yet clear whether we have spam or legit mail. But other characteristics will (most probably) indicate a probability that allows the filter to classify the message as either spam or good mail. Bayesian Spam Filters Can Adapt Automatically.

