Last updated: 13 August 2010

Language Modelling


I am currently working on improving the PPM family of character based language models. This is of particular interest in our research group, in the context of the Dasher project. Statistical language modelling has received a lot of attention since the 1980s and entered the realm of usability with the first prediction by partial matching (PPM) work of Cleary and Witten [1984]. With the rise of computer processing power, and in particular the prevalance of mobile computers, these language models are currently being used for various text entry, speech recognition, text correction, and machine translation tasks. The state-of-the-art is arguably PPM with modified interpolated Knesery-Ney smoothing [Teh, 2006].

Fully Adaptive Models

I am testing the claim that we should be using fully adaptive language models, particularly in scenarios where we expect the test data (text encountered while using the model) to be different from the training data (text used to optimize the model). While modified interpolated Kneser-Ney smoothing works very well, it contains a number of parameters that need to be optimized on training data. The assumption is that the test data will follow the same statistical patterns as the training data. This assumption does not hold generally, and in particular when the test data come from hard to come by sources like emails (due to privacy concerns).

I showed that a simple online optimization of parameters during testing can significantly improve model performance, and developed a new PPM variant that is essentially parameter-free—in the sense that performance on test data is insensitive to the setting of its parameters. See the technical report below for details. I am currently developing a new Bayesian model based observations about the statistics of natural language that are not well-represented by current language models.

Technical Report

Improving PPM  An empirical study of old and novel models for blending contexts in the PPM family of language models. [gzipped PostScript]

Hardware prototype for the error-correcting keyboard

Figure 1 Hardware prototype for the error-correcting keyboard.


Error-Correcting Keyboard  A couple of years ago I tried building an error-correcting keyboard, to learn about both language modelling and hardware. The result was a prototype consisting of hardware and microcontroller software. The hardware lives between a standard keyboard and a desktop computer, intercepts key code signals from the keyboard, and transmits them to the computer. I used an AVR Mega88 processor (which has 8K of code space), and two 128K dataflashes for each of the two (compressed) language models. In Figure 1 the microprocessor is mounted on an AVR development board (top left), there is a RS232 chip (on the white breadboard), which talks to the keyboard, and the two dataflash chips are mounted on a custom cut PCB (bottom right).

The software

The language model is standard PPMC, is pre-trained on some English text, and adapts as the user types. The typing model learns which typing errors the user is likely to make. For example, while doing two-handed touch typing I tend to make 2 kinds of errors. The first type of error is accidentally pressing a key adjacent to the correct one. With me, this does not happen very often. The second type of error is to use the correct finger on the wrong hand (like pressing K instead of D on the QWERTY layout). I make the second type of error more often, and so the model will learn that I am reliable when it comes to using the correct finger, but unreliable when it comes to using the correct hand. The software will therefor trust me rather than the language model when it thinks I made a type-1 error, and vice versa when it thinks I made a type-2 error. Both the language and typing models are update by interpreting correction keypresses (i.e. backspace, delete, left arrow and right arrow) intelligently.

I encountered some problems that make this approach impractical. Firstly, real-time typing correction is practically impossible. By real-time, I mean correcting a keypress as soon as it is made—without waiting to observe some more keypresses. There is just not enough information in the left-context of a character to reliably predict it. This does not mean that all is lost—just that you have to wait to see some right-context before making final predictions/corrections. This is what typically happens on your mobile phone. Secondly, since the software lives outside your computer it can never be aware of the context within which you are typing. In my case this means that I have to switch off the system when programming, using the command line, or using shortcut keys; and switch it back on when writing emails or documents.


[Cleary and Witten, 1984] J.G. Cleary and I.H. Witten. Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications, 32(4):396–402, 1984.

[Teh, 2006] Y.W. Teh. A hierarchical Bayesian language model based on Pitman-Yor processes. In the proceedings of the International Conference on Computational Linguistics and the Annual Meeting of the Association for Computational LinguisticsAssociation for Computational Linguistics, pp. 985–992, 2006.