Tag Archives: Computer science

CAPTCHA Gets Useful

Captcha1

An excellent example of something that leverages a tool that already exists and makes it useful — CAPTCHA forms. AP writes from Pittsburgh:

Researchers estimate that about 60 million of those nonsensical jumbles are solved everyday around the world, taking an average of about 10 seconds each to decipher and type in.

Instead of wasting time typing in random letters and numbers, Carnegie Mellon researchers have come up with a way for people to type in snippets of books to put their time to good use, confirm they are not machines and help speed up the process of getting searchable texts online.

”Humanity is wasting 150,000 hours every day on these,” said Luis von Ahn, an assistant professor of computer science at Carnegie Mellon. He helped develop the CAPTCHAs about seven years ago. ”Is there any way in which we can use this human time for something good for humanity, do 10 seconds of useful work for humanity?”

The project, reCAPTCHA, is using people’s deciphering to go through those books being digitized by the Internet Archive that can’t be converted using ordinary OCR, where the results come out like this:

Captcha2

Those words are sent to CAPTCHAs and then the results fed back into the scanning engine. Here’s the neat bit, though, as explained on the website:

But if a computer can’t read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here’s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.

Which I think is kind of neat: the only problems might occur if people know this and mess the system by getting one right and the other wrong. But how do they know which one?

Directory of Screencasting Resources

Updated Nov 13 2006: added a piece on screencasting in Linux which looks helpful, albeit complicated.

This week’s WSJ.com column, out Friday, is about screencasting (you can find all my columns here; subscription only, I’m afraid):

Screencasts are really simple to grasp. And in some ways they’re not new. But I, and a few thousand other people, think they represent a great way to leverage the computer to train, educate, entertain, preach and otherwise engage other people in a very simple way. Something the Internet and computers have so far largely failed to do.

Screencasts are basically little movies you create on your computer. In most cases, they are movies of your computer. You use special software to capture what keystrokes and mouse clicks you make on your screen – demonstrating how to use Google, say (the screen bit of screencasting — and then, once you’ve edited and added a voiceover, upload it to your Web site and let everyone else watch it (the casting bit.) It’s as simple as that.

Here are some links that may help. Not everyone calls what they do screencasting, but most do. There’s tons more stuff out there, but most of these sites will take you there too.

Introductions

Software

Screencasts

Uses of screencasts

The TiddlyWiki Report, Part I: Jonny LeRoy

This week’s WSJ.com/AWSJ column is about the TiddlyWiki (here, when it appears Friday), which I reckon is a wonderful tool and a quiet but major leap forward for interfaces, outliners and general coolness. I had a chance to chat with some of the folk most closely involved in TiddlyWikis, but sadly couldn’t use much of their material directly, so here is some of the stuff that didn’t fit.

First off, an edited chat with Jonny LeRoy, a British tech consultant who offered his view on TiddlyWikis over IM:

Loose Wire: ok, thanks… i’m doing a little piece on tiddlywikis, and was intrigued to hear how you got into them, how you use them, where you think they might be of use, how they might develop etc…
Jonny LeRoy: sure. I first came across them when a colleague sent round a link. The thing that hooked me was the “install software” page which just said – “you’ve already got it”. I’ve been doing web stuff (mainly Java server side development) for quite a while and seeing the immediacy of the tiddlywiki was great. I’ve tried all sorts of tools for managing thoughts and tasks and generally end up going back to pen and paper after a while. tiddlywiki is fast and easy enough for me to keep using it. The micro-content idea is pretty interesting but I’m also pretty interested in how they slot into general progressions in the “Web 2.0”. more and more functionality can now be pushed client side – especially with Ajax and related async javascript technologies. TiddlyWiki takes this to the extreme by pushing *everything* client-side …
That does raise the problem of sharing and syncing the data, but it’s not really in essence a collaborative tool. though there’s no reason why that can’t be added on top of what’s there. Does that make some sense?
Loose Wire: it does. very well put…
Jonny LeRoy: cheers 😉
Loose Wire: 🙂 i particularly like the tagging idea, which you seem to have introduced…

Jonny LeRoy: Yup – for me when I started using tiddlywiki the main thing missing was any kind of classification. I’ve had a fair amount of experience with pretty complicated taxonomies and ontologies – particularly for managing / aggregating / syndicating content on a travel start-up I was involved in. but the simplicity of sites like delicious and flickr started to make me realise that some simple keyword tags gets you nearly everything you need. and also removes half of the issues related to category hierarchies and maintenance. particularly when your dataset isn’t massive. even when the dataset and tag list grows there are ways of “discovering” structure rather than imposing it … see flickr’s new tag clusters for a good example of this. In the good open source fashion I had a quick hack at the TW code and put some basic tagging functionality in place. A few other people were creating tag implementations at the same time, but they were more based around using tiddlers as tags ….. I was fairly keen just to keep the tags as metadata. I’m still yet to see a good online wiki that has tagging built in. for me that’s been an issue with most wikis I’ve used

Loose Wire: i get the impression that tagging is still considered a social thing, rather than tagging for oneself, as a way to commit to hierarchies, a la outliners etc?
Jonny LeRoy: that’s one of the beauties of it – though not so much in TW. the free-association you get by browsing other people’s tags is amazing. comparing what you can find through something like delicious compared to open directory projects – dmoz etc is quite interesting
Loose Wire: it is great, but i feel there’s huge potential in using tags for oneself, too?
Jonny LeRoy: yup – when you’re using them for yourself you can set your own little rules that get round some of the hierarchy problems. overloaded tags – with more than one meaning can get confusing in a social context, but personally it’s much easier to manage how you refer to things. also the ability to add tags together – so you can search on multiple tags creates an ad hoc structure.
Loose Wire: yes. i’d love to see TWs let you choose a selection of tags and then display the matches… oops, think we’re talking the same thing there…
Jonny LeRoy: yeah – I’d been meaning to put that in place, but haven’t had a moment 🙂
Loose Wire: is that going to happen? all the various TWs are now under one roof, is that right?
Jonny LeRoy: Yeah – Jeremy Ruston – who started it all off seems to be managing things reasonably well. and pulling together different versions. there was a bit of a branch with the GTDWiki which got a lot of publicity.
Loose Wire: is that a good way to go, do you think?
Jonny LeRoy: it’s a weird one, because it’s not like a traditional open source project with code checked into CVS. so versioning can be quite hard. but it’s also one of the beauties of it – anyone with a browser and a text editor can have a go.

Loose Wire: i noticed the file sizes get quite big quite quickly?
Jonny LeRoy: a lot of that is the javascript – if you’re just using it locally then you can extract that out into another file. that makes saving and reloading a bit quicker. the file will grow though with the amount of data you put in.
Loose Wire: is that tricky to do?
Jonny LeRoy: no – you just need to cut all the javascript – put it into a new file and put in an HTML tag referencing it
Loose Wire: how much stuff could one store without it getting unwieldy?
Jonny LeRoy: That really depends on your PC / browser combo – how quickly it can parse stuff.  if you were going to want to store really large amounts of data then you might want to look at ways of having “modules” that load separately.

Loose Wire: is it relatively easy to turn a TW into a website/page?
Jonny LeRoy: yeah – couldn’t be simpler – upload the file to a webserver … and er … that’s it. it does rely on people having javascript enabled – but 99% do. one issue is that since all the internal links are javascript search engines like google won’t follow them. but google will read the whole text of the page if it indexes you

Loose Wire: where do you think this TW thing could go? do you see a future for it? or is it going to be overtaken by something else?
Jonny LeRoy: Definitely – the company I’m working at right now (ThoughtWorks) have used it for a major UK company . they used it for a simple handbook for new people
Loose Wire: oh really? excellent!
Jonny LeRoy: really simple to use and quick to navigate – it got pretty good feedback. I see more people being likely to use it personally on their own pcs though. I use it to keep track of things I’ve got to do or have done. the dated history bit is really useful to work out what was going on a couple of weeks ago.
Loose Wire: the timeline thing?
Jonny LeRoy: yup
Jonny LeRoy: I can also see new TW like products coming out for managing tasks better – an equivalent of tadalist on the client side. beyond that it’s a good thought experiment in how datadriven sites can work. the server can push the data in some structured format to the browser and then the browser uses TW like technology to work out how to render it.
Loose Wire: yes. … [however] i feel a lot of people like to keep their stuff on their own pc (or other device, USB drive, whatever). not all of us are always online….
Jonny LeRoy: exactly – the wiki-on-a-stick idea is great. you can stick firefox and your wiki on the usb key and off you go
Loose Wire: yes, very cool…
Jonny LeRoy: The next step is then to have the option to do some background syncing to a server when you end up online
Loose Wire: do you think more complex formatting, layout and other tasks could be done? and could these things be synced with portable devices?
Jonny LeRoy: the portable devices question is interesting – it really depends on how much javascript they’ve got on their browsers. there’s no reason why it’s not possible, but there are more vagaries of how the functionality is handled
Loose Wire: javascript is the key to all this, i guess….
Jonny LeRoy: it’s a bit like the web in the mid 90s where you didn’t have a clue what people’s browsers would support. it’s actually having a bit of a comeback. many people just see it as a little glue language to stick things together or move things around ….. but it’s actually really powerful – I discovered more of it’s dynamic possibilities while playing with TW. the best thing about it for me is that anyone who’s got a modern browser can run javascript – there’s no extra install.

Loose Wire: yes, making the browser an editor is a wonderful thing… what sort of things do you think we might see with it?
Jonny LeRoy: I’m not sure what new thing we’ll see, but we’ll definitely see the things we use the browser for already getting much better and smoother. the user interaction is starting to become more like working on a locally installed application.

Thanks, Jonny.

Phishing Gets Proactive

Scaring the bejesus out of a lot of security folk this weekend is a new kind of phishing attack that doesn’t require the victim to do anything but visit the usual websites he might visit anyway.

It works like this: The bad guy uses a weakness in web servers running  Internet Information Services 5.0 (IIS) and Internet Explorer, components of Microsoft Windows, to make it append some JavaScript code to the bottom of webpages. When the victim visits those pages the JavaScript will load onto his computer one or more trojans, known variously as Scob.A, Berbew.F, and Padodor. These trojans open up the victim’s computer to the bad guy, but Padodor is also a keylogging trojan, capturing passwords the victim types when accessing websites like eBay and PayPal. Here’s an analysis of the malicious script placed on victims’ computers from LURHQ. Think of it as a kind of outsourced phishing attack.

Some things are not yet clear. One is how widespread this infection is. According to U.S.-based iDEFENSE late Friday, “hundreds of thousands of computers have likely been infected in the past 24 hours.” Others say it’s not that widespread. CNET reported late Friday that the Russian server delivering the trojans was shut down, but that may only be temporary respite.

What’s also unclear is exactly what vulnerability is being used, and therefore whether Microsoft has already developed a patch — or software cure — for it. More discussion on that here. Microsoft is calling the security issue Download.Ject, and writes about it here.

Although there’s no hard evidence, several security firms, including Kaspersky, iDEFENSE and F-Secure, are pointing the finger at a Russian-speaking hacking group called the HangUP Team.

According to Kaspersky Labs, we may be looking at what is called a Zero Day Vulnerability. In other words, a hole “which no-one knows about, and which there is no patch for”. Usually it has been the good guys — known in the trade as the white hats — who discover vulnerabilities in software and try to patch them before they can be exploited, whereas this attack may reflect a shift in the balance of power, as the bad guys (the black hats) find the vulnerabilities first, and make use of them while the rest of us try to find out how they do it. “We have been predicting such an incident for several years: it confirms the destructive direction taken by the computer underground, and the trend in using a combination of methods to attack. Unfortunately, such blended threats and attacks are designed to evade the protection currently available,” commented Eugene Kaspersky, head of Anti-Virus Research at Kaspersky Labs.

In short, what’s scary about this is:

  • we still don’t know exactly how servers are getting infected. Everyone’s still working on it;
  • suddenly surfing itself becomes dangerous. It’s no longer necessary to try to lure victims to dodgy websites; you just infect the places they would visit anyway;
  • Users who have done everything right can still get infected: Even a fully patched version of Internet Explorer 6 won’t save you from infection, according to Netcraft, a British Internet security company.

For now, all that is recommended is that you disable JavaScript. This is not really an option, says Daniel McNamara of anti-phishing website CodePhish, since a lot of sites rely on JavaScript to function. A better way, according to iDEFENSE, would be to use a non-Microsoft browser. Oh, and if you want to check whether you’re infected, according to Microsoft, search for the following files on your hard disk: kk32.dll and surf.dat. If either are there, you’re infected and you should run one of the clean-up tools listed on the Microsoft page.

Windows’ Gaping, Seven Month Hole

Quite a big hooha over this latest Microsoft vulnerability, and I readily ‘fess up to the fact that I didn’t really take this seriously. Seems like I wasn’t the only one.

But folk like Shawna McAlearney of SearchSecurity.com points out that the delay of 200 days between Microsoft being notified and their coming out with a patch is appallingly long. “If Microsoft really considered this a serious or critical vulnerability for nearly all Windows users, it should have been a ‘drop-everything-and-fix’ thing resolved in a short period of time,” Shawna quotes Richard Forno, a security consultant, as saying. “Nearly 200 days to research and resolve a ‘critical’ vulnerability on such a far-reaching problem is nothing short of gross negligence by Microsoft, and is a direct affront to its much-hyped Trustworthy Computing projects and public statements about how security is playing much more important role in its products.” Strong stuff.

So what is all the fuss about? The vulnerability in question can, in theory, permit an unauthenticated, remote attacker to execute arbitrary code with system privileges: That means a ne’er do well could do anything they want in your computer. And while it hasn’t happened yet, to our knowledge, it’s only a question of time, according to Scott Blake, vice president of information security at Houston-based BindView Corp.: “We believe attacks will be conducted remotely over the Internet, via e-mail and by browsing Web pages. We expect to see rapid exploitation — it’s simply a case of when it materializes.”

Paul Thurrot, of WinNetMag, weighs in with his view, pointing out that the flaw is a very simple one: “attackers can compromise the flaw with a simple buffer-overrun attack, a common type of attack that Microsoft has wrestled with since its Trustworthy Computing code review 2 years ago.”