CAPTCHA Gets Useful


An excellent example of something that leverages a tool that already exists and makes it useful — CAPTCHA forms. AP writes from Pittsburgh:

Researchers estimate that about 60 million of those nonsensical jumbles are solved everyday around the world, taking an average of about 10 seconds each to decipher and type in.

Instead of wasting time typing in random letters and numbers, Carnegie Mellon researchers have come up with a way for people to type in snippets of books to put their time to good use, confirm they are not machines and help speed up the process of getting searchable texts online.

”Humanity is wasting 150,000 hours every day on these,” said Luis von Ahn, an assistant professor of computer science at Carnegie Mellon. He helped develop the CAPTCHAs about seven years ago. ”Is there any way in which we can use this human time for something good for humanity, do 10 seconds of useful work for humanity?”

The project, reCAPTCHA, is using people’s deciphering to go through those books being digitized by the Internet Archive that can’t be converted using ordinary OCR, where the results come out like this:


Those words are sent to CAPTCHAs and then the results fed back into the scanning engine. Here’s the neat bit, though, as explained on the website:

But if a computer can’t read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here’s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.

Which I think is kind of neat: the only problems might occur if people know this and mess the system by getting one right and the other wrong. But how do they know which one?


All opinions are my own, and not necessarily those of Thomson Reuters.



RSS loose wire blog