Tag Archives: speech recognition

The Shape of Things to Come

This is from my weekly newspaper column, hence the lack of links.

By Jeremy Wagstaff

We’re all touch typists now.

Of course, the definition of touch type has had to change a little, since most of us don’t actually learn touch typing as we’re supposed to. Watch people tapping away at a keyboard and you’ll see all sorts of cobbled-together methods that would make the office secretary of yesteryear blanch.

But for now keyboards are going to be with us for a while as the main way to get our thoughts into a computer, so some sort of touch typing is necessary.

But the mobile phone is different. After ten years most of us have gotten used to entering text using the predictive, or T9, method, where the phone figures out you’re trying to say “hello” rather than “gekko” when you tap the 4,3,5,5,6 keys.

Texting has gotten faster—Portugal’s Pedro Matias, 27, set a new world record in January by typing a 264-character text in less than 2 minutes, shaving 23 seconds off the previous record—but that’s still slower than your average touch typist, who manages 120 words-say 480 characters—in the same amount of time.

Blackberry uses have their QWERTY keyboards, each key the size of a pixie’s fingernail, and while some people seem to be quite happy with these things, I’m not.

And the iPhone has given us, or given back to us, the idea of little virtual keyboards on our screen. I’ll be honest: I’m not a big fan of these either.

The arrival of the Android phone hasn’t really helped matters: The keyboard is usually virtual (some of the earlier phones had physical keyboards, but most have dropped them in favor of onscreen ones) and I really didn’t enjoy typing on them.

To the point that my wife complained that she could tell when I was using the Android phone over my trusty old Nokia because she didn’t feel I was “so reachable.” By which she means my monosyllabic answers weren’t as reassuring as my long rambling Nokia, predictive text ones.

But that has changed with the arrival of software called ShapeWriter. ShapeWriter is software that provides the same virtual keyboard, but lets you swipe your words on it by dragging your fingers over the keys to, well, form a shape.

Typing “hello,” for example, is done by starting your finger on “h”, dragging it northwest to “e”, then to the far east of “l”, lingering there a second, then north a notch to “o.” No lifting of the finger off the keyboard. Your finger instead leaves a red slug-like trail on the keyboard, and, in theory, when you lift your finger off the keys that trail will be converted to the word “Hello.”

And, surprise, surprise, it actually works. Well, unless you’re demonstrating it to a skeptical spouse, in which case instead of “hello” it types “gremio” or “hemp.”

Now this isn’t the first time I’ve used ShapeWriter. It has been around a while—it was first developed by IBM Labs in the early 2000s. It’s gone through quite a few changes in the meantime, not least in the theory behind it.

But the main bit of thinking is the same as that with predictive text (and speech recognition): what is called the redundancy of language. Taking, for example, the whole body of emails written by Enron employees, the most frequent email sender wrote nearly 9,000 emails in two years, totalling about 400,000 words.

That’s a lot of words. But in fact the number of actual words was about 2.5% of that: That email sender only used 10,858 unique words.

Now of course, Enron employees might not be representative of the wider population, but researchers have to work with data, and the Enron case threw up lots of data. The Enron Email Dataset is a 400 megabyte file of about 500,000 emails from about 150 users, mostly senior management of Enron. Making it a goldmine for researchers of language, machine learning and the like.

Learning from the words used—though presumably not their morals—researchers are able to figure out what words we use and what we don’t. Thus, ShapeWriter, and T9, and speech recognition, are able to tune out all the white noise by only having to worry about a small subset of words a user is typing, or saying. Most words we either don’t use because our vocabularies aren’t that great, or because we haven’t invented those words yet.

ShapeWriter has 50,000 words in its lexicon, but it gives preference to those 10,000 or so words it considers most common (presumably

In ShapeWriter’s case, they produce a template of the shape of each word they decide to store in the software, so the shape you’re drawing—left-far right, up, down, along—is recognised.

In its latest incarnation it actually works surprisingly well, and I’d recommend anyone with an Android phone to check it out. (It’s free.) There’s a version for the iPhone too, as well as Windows Mobile and the Windows Tablet PC. Only downside: For now, at least, only five–European–languages are supported.

I am not convinced this kind of thing is going to replace the real keyboard, but it’s the first decent application I’ve come across that has gotten me back into actually enjoying tapping out messages on my device.

My wife, for one, is happy.

Recharge Vouchers and Fantasies of Schoolmistresses

I’m doing a piece on speech recognition for the Journal, all the time wrestling with my own voice menu cellphone demons. One message from Hong Kong’s 3 network sends me apopleptic with rage while at the same time kind of turning me on, which tells you more about me than you probably want to know.

I use a prepaid card and I use recharge vouchers (a blessing, at least, that they don’t call them top-up cards, which for some reason I find a horrible expression, as if you’re not really paying ridiculous amounts of money to pay for SMS and voice calls).

Anyway, for some reason my Treo doesn’t like inputting the 16 digit number on each recharge voucher, so I get quite a few error messages, delivered by one, possibly two, female voices. The first part of the message, explaining I have not input the correct number, is schoolmistress-stern — you can almost hear the cane being flexed in the background — while the next, asking me to input the number again, addresses me as if I am a complete imbecile. Which, after hearing this message a few times, I kind of feel I am.

Here it is, in all its MP3 glory. I’m going to make it my ringtone.

Voice Commands, Singapore Style

Here’s more on voice recognition replacing touch-tone menus. Is it a good thing?

ScanSoft have teamed up today with Unified Communications –  ’the leading provider of proprietary telecommunication solutions in Asia’ — to launch OneVoice, a ‘voice portal application’ for Singapore Telecommunications Limited (SingTel). OneVoice is a speech-activated service that uses ScanSoft’s SpeechWorks speech recognition and text-to-speech software to allow SingTel subscribers to ‘dial their personal contacts or public establishments, access useful information and carry out their personal information management’.

What does this mean exactly? By dialing *988 or *6988, SingTel customers can access stuff using simple speech commands. Speaking a name already stored in their personal address book would enable them to reach that person. They could also ‘request sports and lottery results, download ringtones, picture messages and logos, utilize location-based services to find the nearest amenities and recommended food outlets’.

The basic idea seems to be to replace navigating a touch-tone menu of options or scrolling through an address book on a cell phone. Not a bad idea, and you’re not replacing real people here but actually adding another layer of usability. (Of course Nokia and several other makes of handphone have the speech option already, where you just speak a name and the phone will dial, but that requires setting up, and I’ve seen more people get embarrassed when it dials by mistake than I have folk getting some serious use from it.)

The downsides I can think of are limited to the idea of storing all your data on a central server. But then again, the cellphone company is going to know all that stuff anyway, so who cares? The only other thing I can think of is the annoying problem of your voice not being recognised.

Which brings me to my only question, a cultural one: Is ScanSoft’s voice recognition software geared towards Singaporean-style English, or a more generic one? Or both? Watch this space.

‘Say ‘Five’ After The Tone If You Want To Curse One Of Our Customer Service Computers’

The good news: We don’t have to use those silly touch-tone menus anymore when we call our friendly utility. Now we can speak to a real computer.

A report by Chartwell, an industry research service, says that more and more utilities “are implementing or investigating speech recognition for their interactive voice response units, and advocates say the technology has the potential to revolutionize automated customer service”. What this ‘revolution’ means, it turns out, is that customers can use voice recognition to report outages, or even conduct “customer self-service” (I love that idea! Why didn’t I think of that?) such as billing, payments and updating account information. As someone who has just tried to resolve some thorny billing problems relating to my mother’s poor choice of electricity and gas utility in the UK, I can only say: Yeehar!

Here’s Dennis Smith, Research Director & Manager of Chartwell’s CIS & Customer Service Research Series: “Speech recognition is a progressive customer self-service tool that can be extremely valuable to a utility, provided it is designed correctly.” Incorrect designs are, among other things, unfriendly self-help service menus. Oh. So we don’t get to chat with a computer, we get to say ‘two’ instead of pressing two on our touch tone phone. That’s progress.

This is another bit I like: The report includes case studies on what it calls ‘progressive utilities’ (as opposed to what? ‘Regressive utilities’? ‘Incorrect thinking utilities’?) utilizing speech recognition technology. One, We Energies, “after concluding that many of its incoming calls related to billing or payments, implemented a speech system in order to offer customers a more personal and prompt way to conduct business without the assistance of a customer service representative (CSR)”. I am particularly happy the customer service representative has an abbreviation: Given it’s the only one in the report I can only assume that assigning an abbreviation is what happens prior to downsizing. And how, exactly, can you have a ‘personal’ way to conduct business using a computer? More personal than what? Pressing the keypads on your phone until they sink into the plastic moulding?

Look, I’m a big fan of computers, and I’m probably still reeling from trying to find an email address I could write to to complain (there wasn’t one; even the website wouldn’t recognise my Mozilla  browser and suggested I upgrade. You’re a utility, for God’s sake! You’re selling electricity! It’s not as if you’re selling Porsches, or smartphones! What if I was some elderly person wanting to check my electricity bill? Jeez) but I don’t get it. I always hoped that computerisation would free up staff so they could talk to customers, find out what’s bugging them, try to make things better. I guess that’s never going to happen now. We’re going to be sitting there in the dark, the electricity long gone out, the gas fire cold, saying ‘four… six… six…. I said SIX’, our voices echoing down the hallways, for eternity. Please, give me an CSR. I really need a CSR.

You can buy the full report, Speech-Enabled Customer Service Applications in the Utility Industry for $350 here.