Speech Dasher: Efficient speech recognition correction

Home > Software > Speech Dasher

Speech Dasher is a novel interface for the input of text using a combination of speech and navigation via a pointing device (such as a mouse). A speech recognizer provides the initial guess of the user's desired text while a navigation-based interface allows the user to confirm and correct the recognizer's output.

It is hoped that Speech Dasher will provide a text input interface which is:

  • More efficient - allowing faster input than either speech or navigation alone.
  • More fun - providing a consistent and less frustrating method of correcting speech recognition errors.
  • More accessible - enabling text input by people unable to use a keyboard and by those using mobile devices.

Entering text using Speech Dasher begins with the user speaking their desired sentence to a continuous speech recognition engine (currently Microsoft's Speech Recognizer v5.1 or Dragon Naturally Speaking 7, 8 or 9). A word lattice is generated from the recognizer's results and then expanded to cover likely recognition errors. The expanded lattice is used to estimate the probability of what letter the user might enter next based on both the recognition results and what has already been entered. The interface also seamlessly integrates a default language model, allowing the user to efficiently enter words missed completely by the recognizer. These probability estimates are then used in the continuous navigation-based interface Dasher to allow the user to confirm and correct their dictated sentence.

An early research prototype is available for download below. So try it out and send me your feedback. I'm especially interested in how it could be improved for people with disabilities.

Currently I'm working on a new version that uses the recognition lattice obtained from the PocketSphinx recognizer. The first set of videos below show the latest version in action.

Publications

Speech Dasher: Fast Writing using Speech and Gaze

Keith Vertanen, David J.C. MacKay

CHI '10: Proceedings of the ACM Conference on Human Factors in Computing Systems, 2010.

Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Keith Vertanen

M.Phil thesis, University of Cambridge, 2004.


Videos

YouTube video showing mouse and eye tracker control
Videos of lattice model (using PocketSphinx)
Old videos of simple correction using lattice model
Old videos of corrections using both navigation and respeaking


Files
SpeechDasher.zip Speech Dasher prototype v0.3