New in Gmail Labs: Smart Labels

New in Gmail Labs: Smart Labels

Wednesday, March 09, 2011 | 10:00 AM

Posted by Stanley Chen, Software Engineer

People get a lot of email these days. On top of personal messages, there are group mailing lists, social network notifications, credit card statements, newsletters you might have signed up for, and promotional email from a shopping site you used once months ago. Gmail’s filters and labels were invented to help manage the deluge, but while I have about 100 filters that triage and label my incoming mail, most of my friends and family have all their messages in a giant unfiltered inbox.

Last year, we launched Priority Inbox to automatically sort incoming email and help you focus on the messages that matter most. Today, we’re launching a complementary feature in Gmail Labs called Smart Labels, which helps you classify and organize your email. Once you turn it on from the Labs tab in Settings, Smart Labels automatically categorizes incoming Bulk, Notification and Forum messages, and labels them as such. “Bulk” mail includes any kind of mass mailing (such as newsletters and promotional email) and gets filtered out of your inbox by default (where you can easily read it later), “Notifications” are messages sent to you directly (like account statements and receipts), and email from group mailing lists gets labeled as “Forums.”

If you already use filters and labels to organize your mail, you may find that you can replace your existing filters with Smart Labels. If you’re picky like me and still want to hold on to your current organization system, Smart Labels play nice with other labels and filters too. On the Filters tab under Settings, you’ll find that these filters can be edited just like any others. From there, you can also edit your existing filters to avoid having them Smart Labeled or change whether mail in a Smart Label skips your inbox (which you can also do by just clicking on the label, then selecting or unselecting the checkbox in the top right corner).

Labs in Gmail are a great testing ground for experimental features, and we hope Smart Labels help you more effortlessly get through your inbox. If you notice a message that was automatically labeled incorrectly and want to help us troubleshoot, you can report miscategorizations from the drop down menu on each message (in doing so, you’ll donate the full message to our engineers so that we can improve the feature). Give it a try and send us feedback on how we can make it work better for you!

This could be interesting. One day they’ll use Bayesian filters and we won’t even have to set up filters of our own. One day.

A New Image for Your Email Address

John Graham-Cumming, author of Bayesian spam filter POPFile, points me to a neat tool he’s created which will turn an email address into an image that may spare you some spam from bots scouring web pages for email addresses:

This site converts a text-based email address (such as me@example.com) and creates an image that can be inserted on a web site. The image contains the email address and is easily read by a human, but is intended to fool web crawlers that search for email addresses.

I can’t guarantee that this is foolproof, but Project Honeypot reports that image obfuscation of an email address is very effective (they say 100%) against web crawlers.

Enter your email address in the box and the server returns a string of gobbledygook which contains the email address (padded with a large amount of random data to avoid a dictionary attack) encrypted using a key known only to the server. When the image is loaded into the web page the server decrypts the email address and creates the image. (The email address is not stored by the server; it resides only in the HTML on your website.)

 Here’s what mine looks like:


Made using jeaig

If you need to put a contact address on your webpage or blog, but hate the amount of spam you’re getting, it’s worth a try.

Technorati Tags: , ,

Put My Book in Your Toilet

John Graham-Cumming, the father of the excellent Bayesian spam killer POPFile, has written a review of my column collection, Loose Wire. It’s a fun read (the review, not the book, although the book is. Really.) He even adds a word to my lexicon:

‘wagstaff (v): to poke any new technology with a long stick, make sure it does what it says on the box, and summarize the experience in less than 2,000 words’.

John concludes that the book “should be in the toilet. In fact, I think it’s such a good book for reading in small doses in a small, quiet room, that a global band of Gideons-like technology evangelists should be leaving copies in the smallest room in the house of any technophile.” Excellent idea. I’ll get onto my publisher about that.

The Email Hole

Email is not something to get too upset about, until you lose one to downtime by your provider of choice. And then you realise that it is too important to be left to free services, or even a domain hoster.

I use a hoster called Hostway, and they went spectacularly down last week. (This despite the fact, or perhaps because of it, that Hostway launched a new service recently offering 150 GB of space for $10 a month.) It was only about a day, but several domains I based there lost email access when their storage failed. Now I have no idea who might have been trying to reach me and couldn’t because of bounced emails, what newsletters I’ve been removed from because of bounced emails, what email newsletters I may have missed

Now this kind of thing happens, but it made me realise that losing one email is the same as losing all of them if you don’t know which email it is, since it may be the important one you’ve been waiting for offering you money/marriage/a new nose. Email is different to hosting a website: a website can go down, and you’ll lose some traffic, but it will come back up again. Email is a stream of discrete bits of information, and there’s no way of telling whether there are any missing.

In short, a good hoster needs to guarantee that, should something go wrong, no email is left behind. Hostway have not, so far not been able to assure me of that. They say that emails lost during the outage have been recovered, but as far as I can work out that does not refer to those lost because of the outage — in other words, those emails that were stored on their servers and not recovered by users before the outage hit. (Emails to their technical staff about this were responded to with pasted notifications from their support team, which didn’t address this issue.

This surprises me, but shouldn’t. They are listed by Netcraft as the second most reliable hoster last month and I’ve not had many problems with them. But they are a domain hoster, which means that bullet-proof email is not top of their priorities. As Syd Low of AlienCamel puts it (declaration of interest: I’ve been using Syd’s email service the past few years, and it’s rock solid), there are three types of email service: bundling services (like Hostway), free services (like Gmail) and paid services (like AlienCamel) which provide Web access, lots of redundant backups to make sure no email goes missing, plus anti-spam, anti-virus and anti-phishing features.

My lesson from all this: email is too important to entrust to people who don’t take it seriously, or who aren’t getting money for your business. Of course, no one wants to pay for something they’re getting for free, or more cheaply, but sometimes free and cheap is not enough.

Keep a Blog, Get Fired

Here’s an interesting statistic, in the light of Scoble’s departure from Microsoft (no direct connection, I promise, but it does raise issues about whether corporates really like blogging): 7.1% of companies have fired an employee for violating blog or message board policies.

According to email security company Proofpoint, whose survey you can download from here, decision-makers at large U.S. companies show growing concern over sensitive information leaving the enterprise through electronic channels such as email, blog pages and message boards: “In fact, 55.4% of these large companies (with 20,000 or more employees) have expressed their uneasiness that regulations guarding the firm’s privacy will be violated by members of the “e-communication” community.  In an effort to reduce risk of exposure, 44% of larger companies employ staff to monitor outbound email, and nearly 1 in 5 companies (17.3%) has disciplined an employee for disobeying blog or message board policies.”

Proofpoint’s survey suggests they may be right: “more than a third (34.7%) of companies report their business was affected by the disclosure of sensitive material in the past year. Furthermore, more than 1 in 3 investigated a suspected email leak of confidential or proprietary information and 36.4% investigated a suspected violation of privacy or data protection regulations in the past year.” While a lot of this is email, “companies fear that financial data, healthcare information, or other private materials may be posted in blogs, sent through instant messaging, or transmitted by other means.”

Some other titbits:

  • Nearly 1 in 3 companies (31.6%) has terminated an employee for violating email policies in the past 12 months. More than half (52.4%) of companies have disciplined an employee for violating email policies in the past year.
  • More than 1 in 5 (21.1%) companies were hit by improper exposure or theft of customer information (whatever that means), while 15% were impacted by improper exposure or theft of intellectual property. (I think this means customer information or other sensitive data were stolen.)
  • Companies estimate that more than 1 in 5 outgoing emails (22.8%) contains content that poses a legal, financial or regulatory risk. The most common form of non-compliant content is messages that contain confidential or proprietary business information.
  • Here’s a funky one: 38% of companies with 1,000 or more employees hire staff to read or analyze outbound email. 44% of larger companies (those with more than 20,000 employees) employ staff for this purpose. I bet you didn’t know your company was hiring people to read your outgoing email.
  • Nearly 1 in 5 companies (17.3%) has disciplined an employee for violating blog or message board policies in the last year. 7.1% of companies fired an employee for such infractions. Ouch. 10% of public companies investigated the exposure of material financial information via a blog or message board posting in the past year.

Of course, Proofpoint have a point to prove (thank you) here, but probably this information is sound. There’s definitely a sense out there that blogging is something that needs to be controlled, for better or for worse. Of course, the bigger point is that information is no longer something that can be kept within organisations. Once it became digital, and once employees could move that digital data out of the company easily (remember when company email was not Internet-based, and there was no gateway out of the company email system? I do) then the walls were already tumbling down. The question now for companies is: do we try to ring-fence as much as we can, or do we put more trust and faith in the hands of employees so they don’t feel the urge to vent outside the company gates?

From the Ashes of Blue Frog

The Blue Frog may be no more,  but the vigilantes are. Seems that despite the death of Blue Security in the face of a spammer’s wrath, the service has built an appetite for fighting back. Eric B. Parizo of SearchSecurity.com reports on a new independent group called Okopipi who intend “to pick up where Blue Security left off by creating an open source, peer-to-peer software program that automatically sends “unsubscribe” messages to spammers and/or reports them to the proper authorities.”

Okopipi has already merged with a similar effort known as Black Frog and has recruited about 160 independent programmers, who are dissecting the open source code from Blue Security’s Blue Frog product. The idea seems to be the same: automatically sending opt-out requests to Web sites referenced in received spam messages, the idea is to over-burden the spammer’s servers (or those of the product he’s advertising) as a deterrence and incentive to register with Okopipi. By registering he can cleanse his spam list of Okopipi members.

Some tweaks seem to be under consideration: Processing will take place on users’ machines and then on a set of servers which will be hidden to try to prevent the kind of denial-of-service attack that brought down Blue Frog.

Possible problems: I noticed that some of the half million (quite a feat, when you think about it) Blue Frog users were quite, shall we say, passionate about the endeavour. These are the kind of folk now switching to Okopipi. This, then, could become an all-out war in which a lot of innocent bystanders get burned. The Internet is a holistic thing; if Denial of Service attacks proliferate, it may affect the speed and accessibility of a lot of other parts of it, as the Blue Frog experience revealed. (TypePad was inaccessible for several hours.)

Another worry: Richi Jennings, an analyst with San Francisco-based Ferris Research, points out on Eric’s piece that project organizers must ensure that spammers don’t infiltrate the effort and plant backdoor programs within the software. “If I’m going to download the Black Frog application,” Jennings said, “I want to be sure that the spammers aren’t inserting code into it to use my machine as a zombie.” I guess this would happen if spammers signed up for the service and then fiddled with the P2P distributed Black Frog program.

Another problem, pointed out by Martin McKeay, a security professional based in Santa Rosa, Calif., that spammers will quickly figure out that the weak link in all this is it rests on the idea of a legitimate link in the email for unsubscribing, and that spammers will just include a false link in there. Actually I thought the link Blue Frog used wasn’t unsubscribe (which is usually fake, since if it wasn’t would then pull the spammer back within the law) but the purchase link. How, otherwise, would folks be able to buy their Viagra?

One element I’d like to understand better is the other weakness in the Blue Frog system: That however the process is encrypted, spammers can easily see who are members of the antispam group by comparing their email lists before and after running it through the Blue Frog/Black Frog list. Any member who is on the spammer’s list will now be vulnerable to the kind of mass email attack that Blue Frog’s destroyer launched. How is Okopipi going to solve that one?

Spammers Get Authenticated

Until now, most spammers sent their stuff through open relays — Internet-connected computers that were either unprotected, or else had been compromised by viruses or trojans into sending the spam without the owner being aware. But that is changing, says AppRiver, and it has big implications for how spammers work and may render useless today’s big thing: email authentication.

Up until now, AppRiver says, ISPs could presume that if they forced a system to authenticate their message before sending it, they could be trusted because spammers couldn’t have access to the authentication mechanism. Authenticating a message basically means you must use a password to send an email as well as to receive it. Before, so long as you knew the correct server for your ISP, you didn’t need a password.

What the bad guys are doing now, AppRiver says, is hacking into the ISPs, figuring out those passwords, and then sending their email through those compromised accounts. This is not only a security risk, it increases the chance for the spammer that those emails will now get through, since they come from what are called “trusted systems” — email servers that require authentication. A survey in April by the Email Sender and Provider Coalition found that 16 of the 18 top U.S. ISPs were applying applying authentication to outgoing e-mails, and eight of those ISPs were also checking for inbound authenticated e-mail and applying some sort of filter to the mail as a result, according to ClickZ News.

AppRiver’s Chief Science Officer, Peter McNeil, predicts that as this tactic becomes widepsread, sender reputation services touted by the big boys — Microsoft’s Sender ID, for example — would effectively wither on the vine. In the meantime, it’s going to mean that for those spammers who have perfected this new art, their junk is more likely to get through than other junk because it appears to be authenticated. (More on all this at SearchSecurity.com, which wrote a piece on it while I was still trying to figure it out.

The Blue Frog vs PharmaMaster

I’ve been trying to make some sense of this recent drama involving Blue Security, an anti-spam registry that effectively tries to deter uncooperative spammers by overwhelming their servers, and recent outages at TypePad and LiveJournal apparently caused by a revenge attack by spammers on Blue Security. (Here’s some more information on Blue Security and the Blue Frog.) The outages were caused when Blue Security redirected the spammers’ attacks on its website to the company’s blogs which were hosted on TypePad and LiveJournal.

So what really happened?

  • Blue Security’s web site has been under attack for most of this past week, via a distributed denial-of-service (DoS) attack which basically tries to overwhelm a site with traffic sent from as many computers as possible (the site is now back up);
  • To try to deflect the attack, which effectively suspended its service, Blue Security changed its Internet address to its TypePad blog;
  • This overwhelmed SixApart’s servers, temporarily affecting all its blogging services, including TypePad and LiveJournal;
  • Meanwhile, spammers presumably linked to the DDoS attack sent threatening emails to, apparently, anyone on the list of the Blue Security do-not-intrude registry. Blue Security works by building a network of users who report spam. The source of the spam is then contacted and then asked to remove all email addresses of its members from their spam lists. If they fail to do so, software installed on users’ computers fills out forms on websites linked to in any subsequent spam, creating a wave of traffic to the spammer’s web site, that, in theory, brings the spammer’s activities to a stop.
  • The spammer, or another spammer, then contacted Blue Security via ICQ instant message, to taunt and threaten the company, apparently in a bid to stop its activities.
  • The spammer, or another spammer, has also been sending emails containing Blue Security contact and registration information. This might have been done in the hope of getting recipients to complain to those email addresses and phone numbers to further overwhelm the company’s resources.

This account is not uncontested. According to a Blue Security press release:

  • Blue Security claims that it was not the victim of a DDoS attack, but that the spammer — identified as PharmaMaster –– persuaded a staff member of a top-tier Internet Service Provider to block Blue Security’s IP address at the backbone. This would have blocked all traffic from outside Israel, where the Blue Security web site is located.
  • Blue Security then closed its web site and posted a note on its blog (hosted elsewhere.)
  • Shortly afterwards, Blue Security says, PharmaMaster launched a DDoS attack on any site associated with Blue Security, causing outages at five top hosting providers, a major DNS provider and a popular blog site.
  • Blue Security has denied reports, including one by the Associated Press, saying that its do-no-intrude lists have been compromised. Blue Security works by allowing compliant spammers to run its email list through a program which compares it with a special encrypted list of Blue Security members. While the spammer is not able to see or access the Blue Security list, Blue Security members’ email addresses will be removed from the spammer’s list. This is done, in part, so individual Blue Security members are not then known to a spammer, and so the spammer cannot gain access to the Blue Security registry for spamming purposes. The AP report suggests the spammer has figured out a way to work out which email addresses belong to Blue Security members by merely comparing its own list before and after running it through the Blue Security removal process. Those email addresses no longer on the spammer’s list must be Blue Security members, the report says.

This account is contested by some security analysts, who point out what they say are some inconsistencies in Blue Security’s account:

  • Elsewhere Blue Security’s Eran Reshef acknowledges that Blue Security didn’t just post a note on its blog, but it redirected traffic from its bluesecurity.com URL to the TypePad blog. He is quoted as saying he didn’t anticipate that the spammer would launch a DDoS attack on such a large player. “I didn’t think he was so crazy as to attack them,” said Reshef. This raises the question: Was this done before or after the DDoS began? Rashef says it was.
  • If Blue Security’s routing was changed internally, as Blue Security suggests, there should be a record. One analyst says he can find no record of anything “fishy.”

Blue Security clearly has its supporters. An article on one website has received, at the time of writing, more than 200 comments. The Blue Security blog’s single post received more than 100 before comments were closed.

Perhaps one of the most interesting aspects to all this is how clearly at least one spammer perceives Blue Security as a threat to its business. Not only is it trying to scare the company and members of its registry into abandoning their approach, but it is also adopting more open tactics: contacting the target directly via ICQ, perhaps in an effort to intimidate or negotiate, and to email and post comments to the above websites to try to scare members into removing their names from the registry and uninstalling the software that returns spam to the sender’s servers.

You don’t need to agree with Blue Security’s tactics to acknowledge they must be making some kind of impact for this to happen. What is perhaps a little bit scary is that Blue Security don’t seem to have been ready for this attack, and reveal some naivety and lack of understanding about how the Internet works by merely redirecting the assault to other servers. Not only would this not solve their problem, it also exposes them to legal action by the companies behind the redirected servers if it emerges that they were not informed beforehand. Still a lot of questions to be answered on this one.

How to Make More Use of the Vicar

In last week’s WSJ column (subscription only, I’m afraid) I wrote about how Bayesian Filters — derived from the theories of an 18th century vicar called Thomas Bayes and used to filter out spam — could also be used to sift through other kinds of data. Here’s a preliminary list of some of the uses I came across:

  • Deconstructing Sundance: how a bunch of guys at UnSpam Technologies successfully predicted the winners (or at least who would be among the winners) at this year’s festival using POPFile, the Bayesian filter of choice;
  • ShopZilla a “leading shopping search engine” uses POPFile “in collaboration with Kana to filter customer emails into different buckets so we can apply the appropriate quality of service and have the right people to answer to the emails. Fortunately, some of the buckets can receive satisfactory canned responses. The bottom line is that PopFile provides us with a way to send better customer responses while saving time and money.”
  • Indeed, even on-spam email can benefit from Bayes, filtering boring from non-boring email, say, or personal from work. Jon Udell experimented with this kind of thing a few years ago.
  • So can virus and malware. Here’s a post on the work by Martin Overton in keeping out the bad stuff simply using a Bayesian Filter. Here’s Martin’s actual paper (PDF only). (Martin has commented that he actually has two blogs addressing his work in this field, here and here.)
  • John Graham-Cumming, author of POPFile, says he’s been approached by people who would like to use it in regulatory fields, in computational biology, dating websites (“training a filter for learning your preferences for your ideal wife,”, as he puts it), and says he’s been considering feeding in articles from WSJ and The Economist in an attempt to find a way predict weekly stock market prices. “If we do find it out,” he says, “we won’t tell you for a few years.” So he’s probably already doing it.

If you’re new to Bayes, I hope this doesn’t put you off. All you have to do is show it what to do and then leave it alone.  If you haven’t tried POPFile and you’re having spam issues, give it a try. It’s free, easy to install and will probably be the smartest bit of software on your computer.

I suppose the way I see it is that Bayesian filters don’t care about how words look, what language they’re in, or what they mean, or even if they are words. They look at how the words behave. So while the Unspam guys found out that a word “riveting” was much more likely to be used by a reviewer to describe a dud movie than a good one, the Bayesian Filter isn’t going to care that that seems somewhat contradictory. In real life we would have been fooled, because we know “riveting” is a good thing (unless it’s some weird wedgie-style torture involving jeans that I haven’t come across). Bayes doesn’t know that. It just knows that it has an unhealthy habit of cropping up in movies that bomb.

 In a word, Bayesian Filters watches what words do, or what the email is using the words to do, rather than look at the meaning of the words. We should be applying this to speeches of politicians, CEOs, PR types and see what comes out. Is there any way of measuring how successful a politician is going to be based on their early speeches? What about press releases? Any way of predicting the success of the products they tout?

technorati tags: , , ,

Where Did That Email Come From?

An interesting new tool from the guys behind the controversial DidTheyReadIt?: LocationMail. (For some posts on DidTheyReadIt, check out here, here, here and here.)

LocationMail tells you where e-mail was sent from. It uses the most accurate data in the world to analyze your e-mail, trace it, and look up where the sender was when the message was sent. Find out where your friend was when she e-mailed you, or where a business contact is really writing from.

LocationMail integrates seamlessly into Outlook or Outlook Express; once installed, it shows you location information next to each message. LocationMail shows the City, State, Country, Company, ISP, and Connection Speed of the sender.

Installs painlessly into Outlook but crashed my Outlook Express. In Outlook a popup window appears with details of where the email was sent from, including the company, location, connection type, domain and IP address. LocationMail does this by using what it thinks is the IP address of the sender and running it through data from DigitalEnvoy and IP registrars. (A fuller explanation is here.) The makers hope to target a range of customers:

With phishing and other forms of Internet fraud becoming more and more problematic, LocationMail protects you from e-mail based frauds. The program can tell you if an email you seemingly received from your local bank was actually sent from a location half way around the globe. By instantly tracing the source of your emails, LocationMail helps keeps you safe from identify thieves. LocationMail lets you identify and eliminate fraudulent transactions from eBay and other Internet-based auction houses.

LocationMail protects companies who accept orders by email. Credit cards are regularly stolen from people in affluent countries, and used for placing online orders by criminals from other countries. By telling you an email’s origination location, the program helps you detect fraudulent inconsistencies.

Whether you’re a business person who wants to keep track of the demographics of prospects and customers, a manager who wants to ensure that incoming email addresses are legitimate and consistent, or a home computer user who is curious about where friends are e-mailing from, LocationMail has the tools that you need.

It costs $30. Another program that does something quite similar is eMailTrackerPro which will also identify the network provider of the sender, including contact information for abuse reporting, and uncovers the ‘misdirection’ tactic commonly used by spammers. Of course, LocationMail may not help that much, since legitimate emails might not, in Internet terms, originate from the place where they should. But it does a pretty good job and is useful if, say, you’re not sure about whether an email is spam or not (it does happen) the fact it originated in Seoul should provide a clue (unless you know lots of people in Seoul, of course).

And most importantly, this isn’t an invasive technology.