3 min read

Speech recognition will soon be everyday reality

Any science fiction movie worth its salt features computers or robots capable of understanding human speech. This will not stay fiction for long*.

When experts perfect natural language processing, it could free us from depending on keyboards and screens – or at least relegate them to a less central role.

Voice recognition isn’t just about computers. Imagine phones, TV sets and microwave ovens that understand spoken instructions.

Natural language processing

Natural language processing won’t all arrive at once. We are probably ten, maybe 15 years away from computers knowing the difference between war and Waugh. Computers capable of meaningful conversation, like Star Wars’ C3P0, won’t be available for 20 years or more.

The early fruits of language processing have been on the market for years. ACT, a now defunct British computer maker, sold a system responding to a limited set of voice commands in 1981.

Today, we have fourth, possibly even fifth, generation voice recognition products that can turn the spoken word into typewritten text – most of the time. You may see them referred to as speech recognition applications.

I can speak Google search enquiries into my Android phone. The software is not flawless, nor is it fast, but nine times out of ten, the software interprets my speech correctly.

The slow progress to date has not been due to software issues, but had more to do with available computer power. You need a hefty processor to run voice recognition. Make that hefty spare processing capacity after the fancy graphics and other cycle-chewing work required by modern operating systems.

Speech recognition magic

When they work as advertised, voice recognition products are impressive.

Arthur C Clark wrote: “Any sufficiently advanced technology tends to look like magic”.

That the feeling you get the first time your speech appears as type on a PC screen.

Speech better for small business

Speech recognition offers more to small businesses than big companies.

There are two reasons for this. First, voice recognition requires effort as users train the software to understand individual voice and speech patterns.

Speech recognition training needs further refinement over the first few weeks of use. This means it works best for motivated people. Mischievous, or reluctant, users can make sure their systems never work effectively.

A second reason is voice recognition products needs support. PC support costs tend to rise with organization size. In a big company, voice recognition’s cost overhead can outweigh the productivity gains.

Dragon, part of Nuance is the biggest name in voice recognition software. The company’s Naturally Speaking software comes in a variety of packages costing from around NZ$200 for a student edition climbing to NZ$1500 for a corporate package.

In addition to a powerful computer – Nuance recommends a 2.4GHz Pentium Dual Core, anything less will deliver disappointing results – voice recognition needs a good microphone. In theory they can work with a PC’s internal microphone. In practice it isn’t worth the bother. In the future computers will come with better microphones, until then you need an external one.

Memory is less important

Memory isn’t important if you use an old operating system, but you’ll need well over 1GB of Ram to run voice recognition on Windows Vista. Any PC sound card will do so long as it handles recording.

PC voice recognition systems are available in packages including microphones, usually on headsets. You can also buy digital voice recorders bundled with speech recognition software – useful for taking voice notes when you are out and about.

You may already have voice recognition. Newer versions of Microsoft Windows and Office have baked-in speech recognition. Microsoft’s speech recognition is not as quick, as polished or as customisable as NaturallySpeaking, but Microsoft’s tools let you test the technology before parting with cash.

You’d need patience to write a book using an existing voice recognition product. They won’t replace typing skills in the immediate future. Nevertheless they are more than adequate for composing emails and short memos.

What’s more, when voice recognition is built into a computer’s operating system, they can control functions such as opening and closing files or selection commands from menus.

Voice recognition is widely used by people with disabilities – especially the blind and other people with seeing difficulties.

* Not likely fiction much longer?

There’s poetic licence here. The ACT voice recognition system I saw in London in 1981 could only ‘learn’ ten words. A salesman told me proper voice recognition was around “two years away”. 30 years on the programs are better, but they still need polish before being acceptable to mainstream users. Maybe two more years will do the trick. As for ‘natural language processing’… that’ll take longer.