High Performance GUI III

One of the technologies beyond the direct to screen laser wand that I have been following is speech recognition. In mobile telephones and a lot of consumer devices this has a great deal of attraction. And the speech recognition device does not have to command the whole of a language. Running a desktop can be limited to a relatively small set of speech commands – say 500 to 1000 words and phrases as opposed to the native language dictionary of 30,000 to 100,000 words. Of course these can be permuted into millions of command phrases – and enunciating uniquely named files might have to be dialog listed – a hybrid GUI+voice applications

So I have high hopes for Speech recognition as part of the GUI. Then I ran into this CIO Insight article. Even given simple commands of expected word phrases ( a list of Amtrak served cities) the system could not guess right. Now this example has three potential strikes against it:
1- it is a sample of one.
2- it is done over a phone which can really add a lot of noise that would likely not be encountered on the desktop.
3- it is performed by a Nuuu Yahhka – and everyone else in the world knows the “understanding” hazards that entails.
But despite the above misgivings, the market has also spoken with Speech Recognition sales declining from $140M in 2000 to $117M in 2004. Certainly the spectacular decline of major technology player Lernout Hauspie did not help the market. And many argue that expectations of HAL2001 like speech understanding is currently holding back an industry that averages 1 error in 20 words with training and nobody agrees without-training capabilities.

I suspect the bigboys, IBM and Microsoft, will stay out of the field until they can deliver 1 error in every 25 words (or better) without any prior training. This would approximate daily speech patterns – however the speech recognition machines do not have human intelligence to backtrack and fill in the blanks based on what is later said.

However I am open to any and all suggestions on promising new speech technologies. If suitably impressed I will definitely add a note here. Just make a comment or email me at JBSurveyer.