Whatever useful stuff the good guys come up with, the bad guys ain’t far behind. A few months back I wrote about researchers at Carnegie Mellon coming up with a way to use CAPTCHA tools to help decipher words in text by the Internet Archive. The basic idea is that the effort to prevent spammers and others automating their intrusion into websites (signing up for stuff, comment spam etc) should not be wasted.
Now a sleazeball has found a way to do the same thing: get folk to decipher CAPTCHA texts through a small program, delivered by Trojan, that offers striptease in exchange for guessing the texts correctly (Trend Micro, via via Seth Godin):
A nifty little program which Trend Micro detects as TROJ_CAPTCHAR.A disguises itself as a strip-tease game, wherein a scantily-clad “Melissa” agrees to take off a little bit of her clothing. However, for her to strut her stuff, users must identify the letters hidden within a CAPTCHA. Input the letters correctly, press “go” and “Melissa” reveals more of herself.
However, the “answers” are then sent to a remote server, where a malicious user eagerly awaits them. The “strip-tease” game is actually a ploy by ingenious malware authors to identify and match ambiguous CAPTCHA images from legitimate sites, using the unsuspecting user as the decoder of the said image.
As Trend Micro points out, the CAPTCHAs in this case are from Yahoo! Web site, suggesting that a spammer is building up Yahoo! accounts.
CAPTCHA Wish Your Girlfriend Was Hot Like Me? – TrendLabs | Malware Blog – by Trend Micro
An excellent example of something that leverages a tool that already exists and makes it useful — CAPTCHA forms. AP writes from Pittsburgh:
Researchers estimate that about 60 million of those nonsensical jumbles are solved everyday around the world, taking an average of about 10 seconds each to decipher and type in.
Instead of wasting time typing in random letters and numbers, Carnegie Mellon researchers have come up with a way for people to type in snippets of books to put their time to good use, confirm they are not machines and help speed up the process of getting searchable texts online.
”Humanity is wasting 150,000 hours every day on these,” said Luis von Ahn, an assistant professor of computer science at Carnegie Mellon. He helped develop the CAPTCHAs about seven years ago. ”Is there any way in which we can use this human time for something good for humanity, do 10 seconds of useful work for humanity?”
The project, reCAPTCHA, is using people’s deciphering to go through those books being digitized by the Internet Archive that can’t be converted using ordinary OCR, where the results come out like this:
Those words are sent to CAPTCHAs and then the results fed back into the scanning engine. Here’s the neat bit, though, as explained on the website:
But if a computer can’t read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here’s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.
Which I think is kind of neat: the only problems might occur if people know this and mess the system by getting one right and the other wrong. But how do they know which one?