Some notes on iTap / T9 / "predictive text"


Legal disclaimer

Both iTap and T9 use the mapping mentioned below, and I wasn't careful to distinguish between the two products when I wrote this page, because I wasn't familiar with the difference between iTap and T9. But I have now been informed: iTAP and T9 are two different products, and compete with each other. T9 is created by Tegic, a company in Seattle, and is used on most Nokia phones. iTAP is a product of a division of Motorola in Silicon Valley. For a list of differences between iTap and T9, see bottom of page.


The "iTap/T9" code for English encodes the characters [A-Z] into the characters [2-9] using this mapping:

sub encode {
    s/[ABC]/2/g;
    s/[DEF]/3/g;
    s/[GHI]/4/g;
    s/[JKL]/5/g;
    s/[MNO]/6/g;
    s/[PQRS]/7/g;
    s/[TUV]/8/g;
    s/[WXYZ]/9/g;
}
[This mapping is the one that has been used on telephones for decades.]

The patented T9 system uses a dictionary (does it also use a general language model?) to decode the string back into english characters. In the event of any ambiguity, the user uses scroll commands (embodied on one or more other keys) to select the desired word from a list offered by the machine.

I was curious to find out how good this code is for English. I could see it being bad in two possible ways.

  1. Assuming that the user is purely typing words from a dictionary, how often will the user have to use the disambiguator, (Example: KISS = LIPS = 5477, and PUFFS = QUEER = 78337!) and how often will this disambiguation be tiresome to perform? I could imagine there being clumps of words that have the same code; the user might have to do a lot of scrolling to pick the right one!
  2. What happens if you've typed in a long word that it does not know? If the machine does not know lewinksy, say, what happens after you have typed in "53946759"? Is the user confronted with an enourmous list of possible strings, none of which is in the dictionary? I would guess the system is virtually useless for sending long strings that are not in the dictionary. (The answer, in T9, is that in such a case, the user is invited to Spell the word, which involves rewriting the whole word using unambiguous multitap. Slightly annoying, but worth it because such words are then added to the T9 dictionary.)
Below I give the answer to the first question.

I used a perl program and the linux dictionary /usr/dict/words: I found that 45373 distinct words mapped to 41439 distinct codes, and 3934 clashes occurred. So if the user picks words at random from the dictionary and the system knows all those words and no others, then the disambiguator will have to be used for less than 10% of words. In fact, for even fewer than that, if the first guess made by T9 is correct, as is often the case. [Dale Grover told me that in practice T9 gets it right better than 97% of the time.]

Here are some of the worst clashes (i.e. clashes involving most words). I note with a smile that the two examples chosen by Motorola (HELLO) and Tegic (HOW) do not suffer clashes with any dictionary words.


Conclusion: The worst dictionary clash is the eleven words:
ACRES, BARDS, BARER, BARES, BASER, BASES, CAPER, CAPES, CARDS, CARES, CASES
The worst-case number of keypresses that the user has to make, given a dictionary word, is thus ten scroll-commands on top of the original word. That does not seem excessive, though I can imagine that people who send messages about acres, bases, cards, cares and cases, all of which are fairly frequent five-letter words, might get a little bored of having to do this. But it is certainly better than the multi-tap code!

A few of the more amusing (though not if your name is Amy) confusable sets are:

 269 :   AMY, ANY, BOW, BOX, BOY, COW  
 7467 :    PIMP, PINS, RIMS, SHOP, SIMS, SINS 
 25663 :         ALONE, ALOOF, BLOND, BLOOD, CLONE
 74687 :         PINTS, PIOTR, PIOUS, RIOTS, SHOTS, SINUS
 5477 :          KISS, LIPS, LISP, LISS


Eight or more words with same code

 729 : 	 PAW, PAY, PAZ, RAW, RAY, SAW, SAX, SAY  
 76737 : 	 PORES, POSER, POSES, ROPER, ROPES, ROSES, SORER, SORES  
 46637 : 	 GONER, GOODS, GOOFS, HOMER, HOMES, HONER, HONES, HOODS, HOOFS, INNER  
 22737 : 	 ACRES, BARDS, BARER, BARES, BASER, BASES, CAPER, CAPES, CARDS, CARES, CASES  
 7283 : 	 PATE, PAVE, RATE, RAVE, SATE, SAUD, SAVE, SCUD  
 2273 : 	 ACRE, BARD, BARE, BASE, CAPE, CARD, CARE, CASE  

Seven words with same code

 4663 : 	 GONE, GOOD, GOOF, HOME, HONE, HOOD, HOOF  
 726 : 	         PAM, PAN, RAM, RAN, SAM, SAN, SAO  
 72837 : 	 PAVES, RATER, RATES, RAVES, SATES, SAVER, SAVES  
 227837 : 	 BARTER, BASTES, CARTER, CARVER, CARVES, CASTER, CASTES  
 2277 : 	 BARR, BARS, BASS, CAPS, CARP, CARR, CARS  
 752837 : 	 PLATES, SKATER, SKATES, SLATER, SLATES, SLAVER, SLAVES  
 7867 : 	 PUMP, PUNS, RUMP, RUNS, STOP, SUMS, SUNS  

Six words with same code

 2877 : 	 BURP, BURR, BUSS, CUPS, CURS, CUSP  
 742737 : 	 PHASER, PHASES, SHAPER, SHAPES, SHARER, SHARES  
 786 : 	 PUN, QUO, RUM, RUN, SUM, SUN  
 34637 : 	 DIMES, DINER, DINES, FINDS, FINER, FINES  
 26637 : 	 BONDS, BONER, BONES, COMER, COMES, CONES  
 787433 : 	 PURGED, PUSHED, RUSHED, STRIDE, STRIFE, SURGED  
 787437 : 	 PURGES, PUSHER, PUSHES, RUSHER, RUSHES, SURGES  
 7243 : 	 PAGE, PAID, RAGE, RAID, SAGE, SAID  
 74687 : 	 PINTS, PIOTR, PIOUS, RIOTS, SHOTS, SINUS  
 7277 : 	 PARR, PARS, PASS, RAPS, RASP, SAPS  
 7627 : 	 ROAR, ROBS, SNAP, SOAP, SOAR, SOBS  
 7673 : 	 POPE, PORE, POSE, ROPE, ROSE, SORE  
 27437 : 	 ARIES, ASHER, ASHES, BRIER, CRIER, CRIES  
 26737 : 	 BORER, BORES, COPES, CORDS, CORER, CORES  
 2253 : 	 ABLE, BAKE, BALD, BALE, CAKE, CALF  
 2263 : 	 ACME, ACNE, BAND, BANE, CAME, CANE  
 74337 : 	 RIDER, RIDES, SHEDS, SHEEP, SHEER, SIDES  
 2666 : 	 AMMO, ANON, BONN, BOOM, BOON, COON  
 529 : 	 JAW, JAY, KAY, LAW, LAX, LAY  
 7327 : 	 PEAR, PEAS, REAP, REAR, SEAR, SEAS  
 7337 : 	 PEEP, PEER, REDS, SEEP, SEER, SEES  
 782537 : 	 PUCKER, QUAKER, QUAKES, RUBLES, STAKES, SUCKER  
 269 : 	 AMY, ANY, BOW, BOX, BOY, COW  
 22537 : 	 ABLER, BAKER, BAKES, BALER, BALES, CAKES  
 7463 : 	 PINE, RIME, RIND, SHOD, SHOE, SINE  
 7467 : 	 PIMP, PINS, RIMS, SHOP, SIMS, SINS  

Five words with same code

 3937 : 	 DYER, DYES, EWES, EYER, EYES  
 763 : 	 POD, POE, ROD, ROE, SOD  
 769 : 	 POX, ROW, ROY, SOW, SOY  
 32837 : 	 DATER, DATES, EATER, EAVES, FATES  
 72437 : 	 PAGER, PAGES, RAGES, RAIDS, SAGES  
 54637 : 	 KHMER, KINDS, LIMES, LINER, LINES  
 272837 : 	 BRAVER, BRAVES, CRATER, CRATES, CRAVES  
 72833 : 	 PAVED, RATED, RAVED, SATED, SAVED  
 486 : 	 GUM, GUN, HUM, HUN, ITO  
 7263 : 	 PANE, RAND, SAME, SAND, SANE  
 7297 : 	 PAWS, PAYS, RAYS, SAWS, SAYS  
 7653 : 	 POKE, POLE, ROLE, SOLD, SOLE  
 7687 : 	 POTS, POUR, ROTS, SOUP, SOUR  
 25663 : 	 ALONE, ALOOF, BLOND, BLOOD, CLONE  
 42779 : 	 GARRY, GASSY, HAPPY, HARPY, HARRY  
 73257 : 	 PEAKS, PEALS, PECKS, REALS, SEALS  
 72537 : 	 PALER, PALES, RAKES, SAKES, SALES  
 372737 : 	 DRAPER, DRAPES, ERASER, ERASES, FRASER  
 2663 : 	 ANNE, BOND, BONE, COME, CONE  
 2673 : 	 BORE, BOSE, COPE, CORD, CORE  
 7282437 : 	 PATCHES, RAVAGER, RAVAGES, SAVAGER, SAVAGES  
 74737 : 	 PIPER, PIPES, RISER, RISES, SIRES  
 76537 : 	 POKER, POKES, POLES, ROLES, SOLES  
 7325 : 	 PEAK, PEAL, PECK, REAL, SEAL  
 6277 : 	 MAPS, MARS, MASS, NAPS, OARS  
 4867 : 	 GUMS, GUNS, HUMP, HUMS, HUNS  
 72237 : 	 PACER, PACES, RACER, RACES, SABER  
 2337 : 	 ADDS, BEDS, BEEP, BEER, BEES  
 5263 : 	 JANE, KANE, LAME, LAND, LANE  
 5277 : 	 JARS, KARP, LAPS, LARS, LASS  
 728464 : 	 PAVING, RATING, RAVING, SATING, SAVING  
 42937 : 	 GAYER, GAZER, GAZES, HAYES, HAZES  
 2427 : 	 AGAR, BIAS, BIBS, CHAP, CHAR  
 26937 : 	 BOWER, BOWES, BOXER, BOXES, COWER  
 2433 : 	 AGED, AGEE, AIDE, BIDE, CHEF  
 2437 : 	 AGER, AGES, AIDS, BIDS, BIER  
 72737 : 	 PAPER, PARES, RAPER, RAPES, RARER  
 5337 : 	 JEEP, JEER, KEEP, LEER, LEES  

Four words with same code

 2867 : 	 ATOP, BUMP, BUMS, BUNS  
 4653 : 	 GOLD, GOLF, HOLD, HOLE  
 6453 : 	 MIKE, MILD, MILE, NILE  
 739 : 	 PEW, REX, SEW, SEX  
 746 : 	 PIN, RHO, RIM, RIO  
 747 : 	 PIP, RIP, SIP, SIR  
 726737 : 	 PAMPER, SCOPES, SCORER, SCORES  
 9327 : 	 WEAR, WEBS, YEAR, YEAS  
 782 : 	 PUB, QUA, RUB, SUB  
 762733 : 	 ROARED, SNARED, SOAPED, SOARED  
 2877464 : 	 BURPING, BUSSING, CUPPING, CURSING  
 2527 : 	 AJAR, ALAR, ALAS, CLAP  
 36837 : 	 DOTES, DOVER, DOVES, ENTER  
 46639 : 	 GOMEZ, GOODY, GOOFY, HONEY  
 74273 : 	 PHASE, SHAPE, SHARD, SHARE  
 767837 : 	 PORTER, POSTER, ROSTER, SORTER  
 3637 : 	 DOER, DOES, ENDS, FOES  
 4367 : 	 GEMS, HEMP, HEMS, HENS  
 426 : 	 HAM, HAN, IAN, IBN  
 427 : 	 GAP, GAS, HAP, HAS  
 74637 : 	 PINES, RINDS, SHOES, SINES  
 3262437 : 	 DAMAGER, DAMAGES, FANCIER, FANCIES  
 3663 : 	 DOME, DONE, FOND, FOOD  
 3673 : 	 DOPE, DOSE, FORD, FORE  
 472837 : 	 GRATER, GRATES, GRAVER, GRAVES  
 5463 : 	 KIND, LIME, LIND, LINE  
 7253 : 	 PALE, RAKE, SAKE, SALE  
 4747 : 	 GRIP, GRIS, IRIS, ISIS  
 5477 : 	 KISS, LIPS, LISP, LISS  
 78253 : 	 QUAKE, RUBLE, STAKE, STALE  
 92837 : 	 WATER, WAVER, WAVES, YATES  
 22733 : 	 BARED, BASED, CARED, CASED  
 6833537 : 	 MUDDLER, MUDDLES, MUFFLER, MUFFLES  
 7688464 : 	 POTTING, POUTING, ROTTING, ROUTING  
 287733 : 	 BURPED, BUSSED, CUPPED, CURSED  
 26337 : 	 ANDES, BODES, CODER, CODES  
 227437 : 	 BARGES, BASHES, CASHER, CASHES  
 7663 : 	 POND, ROME, ROOF, SOME  
 7667 : 	 POMP, POOR, ROMP, SONS  
 7627464 : 	 ROARING, SNARING, SOAPING, SOARING  
 227464 : 	 BARING, BASING, CARING, CASING  
 27433 : 	 ASIDE, BRIDE, BRIEF, CRIED  
 44537 : 	 GILDS, GILES, HIKER, HIKES  
 2267 : 	 ABOS, BANS, CAMP, CANS  
 2275 : 	 BARK, BASK, CARL, CASK  
 722537 : 	 PACKER, SABLES, SACKER, SCALES  
 73277 : 	 PEARS, REAPS, REARS, SEARS  
 2639 : 	 ANDY, ANEW, BODY, CODY  
 2647 : 	 BOGS, BOHR, BOIS, COGS  
 2653 : 	 BOLD, COKE, COLD, COLE  
 367243 : 	 DOSAGE, ENRAGE, FORAGE, FORBID  
 75433 : 	 PLIED, SKIED, SKIFF, SLIDE  
 2662 : 	 ANNA, BOMB, BOOB, COMB  
 2665 : 	 AMOK, BOOK, COOK, COOL  
 2667 : 	 AMOS, BOOR, BOOS, COOP  
 7338237 : 	 REDUCER, REDUCES, SEDUCER, SEDUCES  
 742537 : 	 PICKER, SHAKER, SHAKES, SICKER  
 22437 : 	 ACHES, ACIDS, CAGER, CAGES  
 546 : 	 JIM, KIM, KIN, LIN  
 66737 : 	 MORES, MOSER, MOSES, NOSES  
 7335 : 	 PEEK, PEEL, REEL, SEEK  
 78337 : 	 PUFFS, QUEER, STEEP, STEER  
 7363 : 	 PEND, REND, RENE, SEND  
 762533 : 	 ROCKED, SNAKED, SOAKED, SOCKED  
 87437 : 	 TRIER, TRIES, URGES, USHER  
 7378 : 	 PERU, PEST, REST, SEPT  
 22837 : 	 BATES, BAUER, CATER, CAVES  
 94737 : 	 WIPER, WIPES, WIRES, WISER  
 75867 : 	 PLUMP, PLUMS, SLUMP, SLUMS  
 8437 : 	 TIER, TIES, VIER, VIES  
 3678437 : 	 EMPTIER, EMPTIES, FORTIER, FORTIES  
 966 : 	 WON, WOO, YON, ZOO  
 729464 : 	 PAWING, PAYING, SAWING, SAYING  
 6333537 : 	 MEDDLER, MEDDLES, NEEDLER, NEEDLES  
 52637 : 	 JAMES, LAMES, LANDS, LANES  
 42837 : 	 GATES, HATER, HATES, HAVES  
 72257 : 	 PACKS, RACKS, SACKS, SCALP  
 2278464 : 	 BASTING, CARTING, CARVING, CASTING  
 7225464 : 	 PACKING, RACKING, SACKING, SCALING  
 73337 : 	 REEDS, REEFS, REFER, SEEDS  
 2682437 : 	 BOTCHER, BOTCHES, BOUCHER, COUCHES  
 73357 : 	 PEEKS, PEELS, REELS, SEEKS  
 73377 : 	 PEEPS, PEERS, SEEPS, SEERS  
 226 : 	 ABO, BAN, CAM, CAN  
 228 : 	 ABU, ACT, BAT, CAT  
 64637 : 	 MINDS, MINER, MINES, NINES  
 54837 : 	 KITES, LITER, LIVER, LIVES  
 3463 : 	 DIME, DINE, FIND, FINE  
 7335464 : 	 PEEKING, PEELING, REELING, SEEKING  
 768733 : 	 POURED, ROUSED, SOUPED, SOURED  
 72687 : 	 PANTS, RANTS, SCOTS, SCOUR  
 266 : 	 ANN, BOO, CON, COO  
 76277 : 	 ROARS, SNAPS, SOAPS, SOARS  
 627537 : 	 MAPLES, MARKER, MASKER, NAPLES  
 7874464 : 	 PURGING, PUSHING, RUSHING, SURGING  
 633 : 	 NED, ODD, ODE, OFF  
 24337 : 	 AIDES, CHEER, CHEFS, CIDER  
 7426 : 	 RICO, SHAM, SIAM, SIAN  
 5646 : 	 JOHN, JOIN, LOGO, LOIN  
 7866464 : 	 RUNNING, STONING, SUMMING, SUNNING  
 7455 : 	 PILL, RILL, SILK, SILL  
 5673 : 	 JOSE, LORD, LORE, LOSE  
 7473 : 	 PIPE, RIPE, RISE, SIRE  
 7477 : 	 PISS, RIPS, SIPS, SIRS  
 727737 : 	 PARSER, PARSES, PASSER, PASSES  
 32733 : 	 DARED, EARED, EASED, FARED  
 32737 : 	 DARER, DARES, EASES, FARES  
 96637 : 	 WOODS, WOOER, WOOFS, ZONES  
 7827 : 	 PUBS, RUBS, STAR, SUBS  
 25837 : 	 ALTER, BLUER, BLUES, CLUES  
 7877 : 	 PUPS, PURR, PUSS, RUSS  
 262937 : 	 AMAZER, AMAZES, COAXER, COAXES  
 46537 : 	 GOLDS, HOLDS, HOLES, INKER  
 36737 : 	 DOPER, DOPES, DOSES, FORDS  
 347437 : 	 DIRGES, DISHES, FISHER, FISHES  
 82537 : 	 TAKER, TAKES, TALES, VALES  
 732533 : 	 PEAKED, PEALED, PECKED, SEALED  
 327 : 	 DAR, EAR, FAQ, FAR  
 786633 : 	 RUNOFF, STONED, SUMMED, SUNNED  
 786637 : 	 RUNNER, STONES, SUMMER, SUMNER  
 4283 : 	 GATE, GAVE, HATE, HAVE  
 346 : 	 DIM, DIN, EGO, FIN  
 75283 : 	 PLATE, SKATE, SLATE, SLAVE  
 2833 : 	 BUDD, BUFF, CUED, CUFF  

Difference between iTap (from Lexicus, Motorola) and T9 (Tegic)

  1. T9 is used on Nokia and on many other brands of phone. iTap on Motorola only?
  2. According to two researchers from Motorola `iTAP is better'
  3. iTap offers word-completions. (In my opinion, this feature, while nice for long words, makes iTap harder to explain, since a novice user, having heard that iTap does word completions, is likely to be demoralized and confused by the bad predictions that are made when only half the word is written. When I explain T9 to people I tell them to ignore the display until they have finished the word.)
  4. ITap's predictions are context-dependent. This means it can predict whole sentences, which is nice, if you are a predictable writer. But T9 advocates would emphasize the advantage of T9's being NOT context-dependent is that you know that to write a particular word, you can memorize a particular key sequence - for example, to write "HOME", you always press "4663**" (or some such), independent of context. This is good for useability, as it means the experienced user can go fast and doesn't need to look at the display.
  5. From this Motorola review: `iTap has its faults. For one, pressing the 1 button defaults to putting the number 1 in the word instead of putting a period. If you enter a space after the word, the 1 key will default to a period, but not if you are at the end of a word. This is real annoying, as you either have to waste a character at the end of each sentence, or you need to waste a keystroke to select the period instead of the 1. What were they thinking?'
  6. You can correct iTap as you write a word, and `lock in' your corrections, by using the arrow buttons. (This option is not available in T9 - and perhaps for good reason, since it is often not necessary to make corrections.) The recommended way of adding a word to iTap's dictionary is to use this `lock in corrections as needed' approach, rather than the simple `multitap' (abc) approach chosen in T9. This means that in iTap, you have to keep switching buttons (from 1-9 to the arrow buttons)
  7. In T9, '0' is used to insert a space (and implicitly to confirm that the displayed word is fine). In iTap, 'Select' is used to terminate words AND to insert a space. Pressing Select twice, in iTap, will send the text message.
  8. You can enter symbols and numbers in iTap without switching mode. (Actually, you can enter numbers in T9 too, by holding down the corresponding key.)
  9. My take on the difference between iTap and T9: T9 is very simple to explain: iTap has more features which make it harder to explain, and perhaps it demands a little more attention from the user too. iTap makes the user make decisions of the form `shall I stop writing the word now, and try to find it in the word completion mode, or shall I continue writing the word?'
    A user faced by such choices may find he regrets his decisions. T9 doesn't bother the user with such choices. You just keep going, and you'll be writing at close to one character per key press, which is fine. I never regret using T9.
    Users may also misunderstand the choices they are offered in itap: they may think that, since they are offered the chance to correct the word on the fly as they write it, they should do so; but doing so leads to slower writing.
further reading on iTAP and a nice shock wave iTap demo.


David MacKay / mackay@mrao.cam.ac.uk - home page.