New in Gmail Labs: Smart Labels

New in Gmail Labs: Smart Labels

Wednesday, March 09, 2011 | 10:00 AM

Posted by Stanley Chen, Software Engineer

People get a lot of email these days. On top of personal messages, there are group mailing lists, social network notifications, credit card statements, newsletters you might have signed up for, and promotional email from a shopping site you used once months ago. Gmail’s filters and labels were invented to help manage the deluge, but while I have about 100 filters that triage and label my incoming mail, most of my friends and family have all their messages in a giant unfiltered inbox.

Last year, we launched Priority Inbox to automatically sort incoming email and help you focus on the messages that matter most. Today, we’re launching a complementary feature in Gmail Labs called Smart Labels, which helps you classify and organize your email. Once you turn it on from the Labs tab in Settings, Smart Labels automatically categorizes incoming Bulk, Notification and Forum messages, and labels them as such. “Bulk” mail includes any kind of mass mailing (such as newsletters and promotional email) and gets filtered out of your inbox by default (where you can easily read it later), “Notifications” are messages sent to you directly (like account statements and receipts), and email from group mailing lists gets labeled as “Forums.”

If you already use filters and labels to organize your mail, you may find that you can replace your existing filters with Smart Labels. If you’re picky like me and still want to hold on to your current organization system, Smart Labels play nice with other labels and filters too. On the Filters tab under Settings, you’ll find that these filters can be edited just like any others. From there, you can also edit your existing filters to avoid having them Smart Labeled or change whether mail in a Smart Label skips your inbox (which you can also do by just clicking on the label, then selecting or unselecting the checkbox in the top right corner).

Labs in Gmail are a great testing ground for experimental features, and we hope Smart Labels help you more effortlessly get through your inbox. If you notice a message that was automatically labeled incorrectly and want to help us troubleshoot, you can report miscategorizations from the drop down menu on each message (in doing so, you’ll donate the full message to our engineers so that we can improve the feature). Give it a try and send us feedback on how we can make it work better for you!

This could be interesting. One day they’ll use Bayesian filters and we won’t even have to set up filters of our own. One day.

Lost in Transmission

image

I dread to think how much eBay is paying Waggener Edstrom to handle press relations for their Toy Crusade. At least I think that’s what is being launched — all the press stuff I received this morning, including image-laden email, attachments was all in Chinese. Oh, except for the headline.

I know I should, but I don’t speak Chinese.

Now, admittedly, the event is about China, it’s being organized in Hong Kong, and the website itself is entirely in Chinese (no English version in sight), but you’d think one of the world’s biggest PR agencies could have managed

  • to have a database of journalists’ language preferences clue: names are often a giveaway), or
  • perhaps an English-language version somewhere in the text, or
  • a link to an English-language version, or
  • an explanation that this is a Chinese-language only event/issue, or
  • a link on the email indicating it was sent by an intern with no idea of what mayhem he may be creating for himself by blasting off emails to all and sundry, or
  • a link in the email to a place where we journalists can complain volubly and ensure we never receive another email like it.

Serious lesson in this: At the very least, this kind of email is likely to end up as spam in a non-Chinese speaking recipient’s email inbox because the Bayesian filters will have been trained to treat it as such. (This is what happened to mine.) So that’s all pretty much a waste of everyone’s time.

But at the most, as a PR agency you’re being paid large amounts of money to target the message to the right people. I’m clearly not the right people. So either don’t send it to me, or send me an English language version, or send me a query about whether this might be of interest. Or expect me to get grumpy, and take 15 minutes of my day to write a grumpy blog post like this.

Update, Aug 27 2007: I’ve just heard from Waggener who have offered an apology and explanation:

In the case of the toy crusade press release, a staff member accidentally inserted the wrong distribution list, and this was overlooked by their supervisor during the checking process.

People do make mistakes and of course the individuals concerned are very apologetic.  To be sure, we have also added more safeguards to the process to minimize the likelihood of this ever happening again.

Fair play. Of course it’s better that these things don’t happen, but they do, and their response is measured and the right one. The proof will be in the pudding — will it happen again?

A New Image for Your Email Address

John Graham-Cumming, author of Bayesian spam filter POPFile, points me to a neat tool he’s created which will turn an email address into an image that may spare you some spam from bots scouring web pages for email addresses:

This site converts a text-based email address (such as me@example.com) and creates an image that can be inserted on a web site. The image contains the email address and is easily read by a human, but is intended to fool web crawlers that search for email addresses.

I can’t guarantee that this is foolproof, but Project Honeypot reports that image obfuscation of an email address is very effective (they say 100%) against web crawlers.

Enter your email address in the box and the server returns a string of gobbledygook which contains the email address (padded with a large amount of random data to avoid a dictionary attack) encrypted using a key known only to the server. When the image is loaded into the web page the server decrypts the email address and creates the image. (The email address is not stored by the server; it resides only in the HTML on your website.)

 Here’s what mine looks like:


Made using jeaig

If you need to put a contact address on your webpage or blog, but hate the amount of spam you’re getting, it’s worth a try.

Technorati Tags: , ,

How to Make More Use of the Vicar

In last week’s WSJ column (subscription only, I’m afraid) I wrote about how Bayesian Filters — derived from the theories of an 18th century vicar called Thomas Bayes and used to filter out spam — could also be used to sift through other kinds of data. Here’s a preliminary list of some of the uses I came across:

  • Deconstructing Sundance: how a bunch of guys at UnSpam Technologies successfully predicted the winners (or at least who would be among the winners) at this year’s festival using POPFile, the Bayesian filter of choice;
  • ShopZilla a “leading shopping search engine” uses POPFile “in collaboration with Kana to filter customer emails into different buckets so we can apply the appropriate quality of service and have the right people to answer to the emails. Fortunately, some of the buckets can receive satisfactory canned responses. The bottom line is that PopFile provides us with a way to send better customer responses while saving time and money.”
  • Indeed, even on-spam email can benefit from Bayes, filtering boring from non-boring email, say, or personal from work. Jon Udell experimented with this kind of thing a few years ago.
  • So can virus and malware. Here’s a post on the work by Martin Overton in keeping out the bad stuff simply using a Bayesian Filter. Here’s Martin’s actual paper (PDF only). (Martin has commented that he actually has two blogs addressing his work in this field, here and here.)
  • John Graham-Cumming, author of POPFile, says he’s been approached by people who would like to use it in regulatory fields, in computational biology, dating websites (“training a filter for learning your preferences for your ideal wife,”, as he puts it), and says he’s been considering feeding in articles from WSJ and The Economist in an attempt to find a way predict weekly stock market prices. “If we do find it out,” he says, “we won’t tell you for a few years.” So he’s probably already doing it.

If you’re new to Bayes, I hope this doesn’t put you off. All you have to do is show it what to do and then leave it alone.  If you haven’t tried POPFile and you’re having spam issues, give it a try. It’s free, easy to install and will probably be the smartest bit of software on your computer.

I suppose the way I see it is that Bayesian filters don’t care about how words look, what language they’re in, or what they mean, or even if they are words. They look at how the words behave. So while the Unspam guys found out that a word “riveting” was much more likely to be used by a reviewer to describe a dud movie than a good one, the Bayesian Filter isn’t going to care that that seems somewhat contradictory. In real life we would have been fooled, because we know “riveting” is a good thing (unless it’s some weird wedgie-style torture involving jeans that I haven’t come across). Bayes doesn’t know that. It just knows that it has an unhealthy habit of cropping up in movies that bomb.

 In a word, Bayesian Filters watches what words do, or what the email is using the words to do, rather than look at the meaning of the words. We should be applying this to speeches of politicians, CEOs, PR types and see what comes out. Is there any way of measuring how successful a politician is going to be based on their early speeches? What about press releases? Any way of predicting the success of the products they tout?

technorati tags: , , ,

A Better Way To Measure The Spam Flood

Here’s an interesting take on spam which helps illustrate how big a problem it has become.

Florida-based email service ZeroSpam Net (0SpamNet) says (via email, afraid no URL available at time of writing) that current methods of measuring spam, as a percentage of total email traffic, has become meaningless.

Two years ago, seeing Spam grow from 60% to 70% in a month or two had some meaning. Over the last couple of months the impact of Spam growing from 85% to 90% has been lost by being reported as a percentage. That last 5% of growth as a percentage of total traffic represents a 50% growth in the total volume of Spam. Measurement of Spam volume as a percentage of total traffic is a poor indicator of the ever increasing size of the Spam problem.

Instead it proposes an index, which it calls the ZSN Spam Index, which accounts for spam and legitimate email growth against a constant reference value of 100 valid messages. This takes into account the increase in normal email traffic — roughly 12% per year. The index goes back to November 2002, with a value of 66.67 — i.e. about 67 spam messages for every 100 valid emails. Now the index is at 782.12. That’s 800 spam messages for every 100 valid ones. Gasp.

Here’s the chart (PDF).

Why do people never talk about CAN-SPAM anymore, I wonder?

Email Goes Unlimited

First there was Gmail, with its 1 gigabyte email storage service. Now unveiled today, there’s AlienCamel, an Australian email service claiming to be the first to offer unlimited email storage.

First off, a declaration of interest: I’ve been using AlienCamel for a while, and have gotten to the know the guy behind the service, Sydney Low. But I have to say it’s a pretty good offer for $16 a half year, along with very good spam filtering and virus-free emails, courtesy of Bayesian filtering, a neat system of advising you when there’s email that appears to not be spam but from someone who’s not on your whitelist, and two virus engines (Kaspersky and ClamAV) to keep your emails free of nasties.

I’d recommend a tryout. It’s not a perfect world when you have to pay extra for an email service on top of your ISP account, but unless your ISP offers good customer support, good spam filtering, decent online storage and virus-free email, services like this make a lot of sense.

McAfee Comes Late To Rev. Bayes’ Party

McAfee seems to have come somewhat late to the spam party: Network Associates, Inc. , ‘the leader in intrusion prevention solutions’, today announced that it has incorporated “powerful new Bayesian filtering into the latest McAfee SpamAssassin engine”. What, only now?

Bayesian filtering is a pretty powerful weapon in the war against spam. I use POPFile and K9 and would recommend either, not least because they’re free. But why has it taken so long for McAfee to get around to including it in their SpamAssassin product?

To be fair, the McAfee Bayesian filter is “fully automated in its learning abilities, whereas other competitive solutions require manual training by users or systems administrators”. That is an improvement, but I wonder how well it works.

SpamKiller/Assassin also includes some other features, including Integrity Analysis, which applies algorithms to determine if the email is spam, Heuristic Detection, Content Filtering, Black and White Lists and DNS-Blocklist Support.

SpamBully Grows Up

A second version SpamBully, a Bayesian filter based spam fighter, has been released.

SpamBully 2.0 integrates into Outlook and Outlook Express and introduces some new features:

  • Email blocked based on the language of the email or the country of origin;
  • A link analyzer looks for spam by following links in an email and analyzing the web pages. Realtime Blackhole List integration continually checks for domains that are responsible for sending spam and automatically filters them from the Inbox;
  • Users can choose words and phrases they wish to allow or block from their Inbox;
  • Customizable languages, including English, Spanish, Italian, German, French, Russian, and Chinese.

These sound like good features. It’s a shame the product doesn’t work outside the Outlook world, but for those within it, it sounds like it’s worth a try. SpamBully 2.0 is free to try for 14 days. Single user licenses cost $30.

Spammers’ Shopfront Vigilantism, Part II

Further to my previous posting, here’s another way to keep the spammers out by checking out the links they want you to go to.

Sophos, the British virus people, say that their year old URL filtering “continues to prove to be an enormous success”. The filtering basically collects known spam sites and bans any email which contains them somewhere in the message. Today, Sophos says, the URL filter identifies over 50% of the spam detected by Sophos PureMessage email software.

An innovation, Spammer Asset Tracking, goes further by looking at the source and destination locations of the email, sniffing for suspicious spammer activity. This speeds up adding spam sites to the blocked list.

Not a bad idea, and a feature that home-based spam filtering, such as Bayesian filters, couldn’t really manage to do. No mention is made of scam emails in the press release, but I assume they must be in there somewhere, given Sophos’ interest in such matters. (Stopping them, I mean, not sending them.)

Anti-Spam Mail Service Aliencamel Adds Humps

One anti-spam service I tried a few months back was Melbourne-based Aliencamel, which I thought was good but not perfect, have just announced some new features which may make the product more competitive in a tight marketplace. Aliencamel works as a mix of different anti-spam and anti-virus elements designed to keep out the riff-raff so you only download what you want.

The new version turns Aliencamel into a kind of email account in its own right, including the ability to preview email in a web browser before tagging it as spam or downloading via your normal email program, full webmail access to your mailbox, as well as disposable email addresses you can use to deal with suspect web sites and third parties you’re not sure about. On top of that the service’s Pending Email Advisory — a sort of floating alert that lets you know of new email that is suspect without actually sending it to you — changes to reduce frequency of advisory emails.

Most important, I think, is the fact that Aliencamel are going to embrace Bayesian filters — the simple method of assigning a probability of spamminess to emails by looking at the innards of the email (content, header, HTML code) and comparing it to other emails it has looked at. I adore Bayesian filters (I still use POPFile) so I think it’s great that Aliencamel are moving in that direction.

(Aliencamel, by the way, is an anagram of clean email. It took me months to get it.)