Dasher - Character LM PA154 Language Modeling (4.1) Pavel Rychlý pary@fi.muni.cz March 12, 2024 Dasher ■ authors: David MacKay, David Ward ■ Cambridge University; freeware ■ support for highly efficient text input for using means other than a standard computer keyboard ■ alternative for thousands of people with various physical disabilities ■ on-screen text input using a positioning device (mouse, joystick...) ■ uses a probabilistic predictive language model ■ is still under development (technology remains the same) Pavel Rychlý • Dasher - Character LM • March 12, 2024 2/33 About Dasher ■ Dasher is free ■ open-source software ■ GNU Generel Public License ■ alphabet for more than 150 languages ■ font colour setting ■ system learns and offers letter combinations that are more used Pavel Rychlý • Dasher - Character LM • March 12, 2024 3/33 Areas of use ■ assistive technology (disabilities - without hands, with one hand...) ■ Pocket PC, iOS, Android, Linux, macOS, Microsoft Windows ■ complex languages (e.g. Japanese) ■ latest version 5.0.0 (beta) from April 8, 2016 Pavel Rychlý • Dasher - Character LM • March 12, 2024 4/33 Principle File Edit Options Control Prediction Help n 5 a nne □ r □ t o □ be To be or not to be □ □th&tP I em 1 ost 0 w t letters in alphabetical order, each letter is in a rectangle The rectangle with the selected letter contains again the complete alphabet from which the 2nd symbol can be selected, etc. basic idea: more probable letters are in a larger rectangle rectangle sizes are decided on the basis of the language model Pavel Rychlý • Dasher - Character LM • March 12, 2024 5/33 "Inverse" arithmetic coding ■ arithmetic coding (text compression): codeword is a number from the interval (0,1), by successive encoding of symbols the intervals are refined in the ratio of the probabilities of occurrence of a character ■ lossless data compression method ■ in Dasher, the ypsilon coordinate represents the entire interval (0,1), where each alphabet symbol has an associated segment of length corresponding to the probability of its occurrence in a given context Pavel Rychlý • Dasher - Character LM • March 12, 2024 6/33 Arithmetic coding - example for four-symbol model ■ codeword is a number from the interval [0,1) ■ 60 % for symbol NEUTRAL; interval is [0, 0.6) ■ 20 % for symbol POSITIVE; interval is [0.6, 0.8) ■ 10 % for symbol NEGATIVE; interval is [0.8, 0.9) ■ 10 % for symbol END-OF-DATA; interval is [0.9, 1) ■ symbol in END-OF-DATA section means that decoding is complete Pavel Rychly • Dasher - Character LM . March 12, 2024 7/33 Arithmetic decoding 0.8 0.9 1 0.48 0.516 0.528 0.534 0.54 —I-1-I Pavel Rychlý • Dasher - Character LM • March 12, 2024 message is encoded as the number 0.538 encoder with interval [0,1) is divided into four subintervals; the message is in the NEUTRAL section interval [0, 0.06) is divided into four subintervals; the message is in the NEGATIVE section interval [0.48, 0.54) is divided into four subintervals; the message is in the END-OF-DATA section8733 Encoding the message "WIKI" by arithmetic coding ■ Each symbol has its probability in the interval [0, 1) ■ The number of message symbols or terminal symbol must be known ■ interval is represented in binary system Pavel Rychlý • Dasher - Character LM • March 12, 2024 9/33 Encoding of message "WIKI" by arithmetic coding continued 1. 2. W .ir K 3. I .0011 .01- .11- ww .0001—Ito oooi—IL- [wi zr f 4 K .001 1 01- .11- .WIW Wll 'WIK 5. I 0001- JG0OI1- mffinsr ■ Ol 011V 0011 .01 WIKW WIKI WIKK .11 - 6. [.0010101, .0010111) - .001011 interval "W" is [0, 0.01) interval "I" is [0.01, 0.11) interval "K" is [0.11,1) Pavel Rychlý • Dasher - Character LM • March 12, 2024 10/33 Encoding of message "WIKI" by arithmetic coding continued 1. 2. W 3. I 4 K 5. I 11—L ,11— ,11— ,11 K 1 —'— i—I— 1- 1 6. [.0010101, .0010111) - .001011 ■ encodes "W" [0, 0.1) first ■ followed by "I" [0.001, 0.0011) ■ then the "K" is [0.00101, 0.0011) ■ and finally "I" [0.0010101, 0.0010111) ■ the result is a number from the final interval Pavel Rychlý • Dasher - Character LM • March 12, 2024 11/33 PPM (Prediction by Partial Match) ■ The language model used in Dasher is not limited to the concept of words ■ combines information about n-grams with probabilities of occurrence of each symbol from the dictionary ■ context 4-5 symbols Pavel Rychlý • Dasher - Character LM • March 12, 2024 12/33 PPM - 3 modes ■ Standard letter-based PPM (calculates probability by partial matching) ■ Word-based model (word dictionary with frequencies) ■ Mixture model (PPM/dictionary) Pavel Rychlý • Dasher - Character LM • March 12, 2024 13/33 Language Model (3) ■ language model learns over time (learns new user's expressions or phrases) ■ everything we write is automatically saved to a file as additional training data Pavel Rychlý • Dasher - Character LM • March 12, 2024 14/33 Other features ■ import of training data simply by loading the file ■ data source for Czech: Institute of the Czech National Corpus, Faculty of Arts, Charles University ■ any alphabet: e.g. also LaTeX, C, IPA ■ other software - 2 modes: normal typing and word completion (user has to switch between them) ■ Dasher has one mode which combines both Pavel Rychlý • Dasher - Character LM • March 12, 2024 15/33 Input method types ■ computer mouse ■ touchpad ■ touchscreen ■ eyetracker ■ headmouse ■ trackball ■ trackpad ■ breath ■ buttons ■ tilt sensors Pavel Rychlý • Dasher - Character LM • March 12, 2024 16/33 Mouse, touchpad, touchscreen File Edit Options Help Dasher is great writing speed using mouse: after 10 minutes of training 5-15 words/min., after an hour: 15-25 words/min., experienced users: 40 words per minute (as fast as typing by hand using a keyboard) sample of Dasher video: ipaq 2.1 Pavel Rychlý • Dasher - Character LM • March 12, 2024 17/33 Eyetracker ■ camera + sensors that detect where the user is looking on the screen ■ initial price: 2000 - 4000 USD Pavel Rychlý • Dasher - Character LM • March 12, 2024 18/33 Eyetracker Tobii Eye Tracker 5 price: 229 EUR Engineered for gaming also built in (gaming) laptops Pavel Rychlý • Dasher - Character LM • March 12, 2024 19/33 Eye Dasher input speed: after ten minutes of training 7 words/minute, after an hour: 20 words/minute, experienced users: 30 words/minute eyetracking without Dasher, only with virtual (on-screen) keyboard: 15 words/min., error-rate 5x higher Pavel Rychlý • Dasher - Character LM • March 12, 2024 20/33 Eye Dasher - User friendliness ■ input using the virtual (on-screen) keyboard is discrete (waiting for the timer to expire, or blinking) ■ Dasher provides continuous input ■ video: eye_dasher Pavel Rychlý • Dasher - Character LM • March 12, 2024 21 /33 Headmouse ■ IR camera ■ reflexive points ■ price: 500-1500 USD Pavel Rychlý • Dasher - Character LM • March 12, 2024 22/33 Breath Dasher ■ direct relation between lung volume and ypsilon coordinate value ■ one-dimensional (cannto go back) ■ therefore: Control mode ■ Control area (Stop, Pause, Move, Delete) ■ video: breath_dasher Pavel Rychlý • Dasher - Character LM • March 12, 2024 23/33 Button Dasher 3 directions ■ forward up ■ forward down ■ back Pavel Rychlý • Dasher - Character LM • March 12, 2024 24/33 Dasher vs. speech recognition ■ inapplicability of automatic speech recognition systems in noisy environments ■ even with the best recognizers about 5 % error rate (difficult editing of errors) Pavel Rychlý • Dasher - Character LM • March 12, 2024 25/33 Speech Dasher: Efficient speech recognition correction ■ Step 1: text input using a combination of speech and navigation via pointing device (mouse) ■ Step 2: Speech recognizer makes an initial estimate of the text, the user edits or confirms the output ■ initial error rate of 22 %, users usually fix everything ■ faster than repair using separate speech recognition (special commands) ■ faster than a standalone Dasher ■ video: speech_dasher Pavel Rychlý • Dasher - Character LM • March 12, 2024 26/33 Other options - Swype ■ virtual keyboard for smartphones and tablets ■ developed by Nuance Communications ■ typing by continuous stroke on QWERTY/QWERTZ/AZERTY/National keys ■ word guessing using predictive dictionary (we can add our own words) ■ more accuracy for longer words (short ones usually have more possibilities to interpret the stroke on the screen) ■ typing without diacritics, offered variants with diacritics Pavel Rychlý • Dasher - Character LM • March 12, 2024 27/33 Swype (2) ■ typing speed up to over 50 words/min. ■ handles simple punctuation (even smileys) ■ app is able to learn from Facebook, Gmail, Twitter... ■ also available in Czech ■ possibility to dictate in different languages using Dragon Dictation module (also in Czech) ■ videohttp : //www. youtube . com/watch?v=S J-RAef CG_c Pavel Rychlý • Dasher - Character LM • March 12, 2024 28/33 Other options -SwiftKey ■ free for Android, iOS, iPhone ■ learns using previous text communication (SMS, Gmail, texts in RSS, it also adapts to letters that you repeatedly press slightly off) ■ multiple languages (up to 5 simultaneous) ■ typo correction ■ next word prediction (offers the most likely variants of the following words) ■ 800 emoji ■ Emoji Prediction feature - learns to predict relevant emoji Pavel Rychlý • Dasher - Character LM • March 12, 2024 29/33 SwiftKey (2) ■ quality dictionaries (correspond to trends in communication) ■ can be typed in Swype style ■ English dictation functions can be turned on ■ June 2012 release of SwiftKey Healthcare; prediction based on real clinical data ■ April 2016 release of ShakeSpeak; emulating W. Shakespeare's speech to celebrate the 400th anniversary of his death ■ year 2016 Microsoft buyout of SwiftKey ■ videOihttp : / /www. youtube . com/watch?v=kA5Horw_SOE Pavel Rychlý • Dasher - Character LM • March 12, 2024 30/33 Other options - SlidelT ■ similar to Swype keyboard - typing by dragging between characters ■ lower requirements for typing accuracy ■ quality dictionaries (possibility to install more, including Czech) ■ more than 70 language sets ■ keyboard customisation option ■ calculates the variants of the words the user wanted to type ■ autocompletion of spaces and capital letters ■ videOihttp : / /www. youtube . com/watch?v=Tp_7bWuvQwQ Pavel Rychlý • Dasher - Character LM • March 12, 2024 31 /33 Other options - GO Keyboard ■ prediction in many languages ■ possibility to change skins and backgrounds ■ ability to import names and SMS into the dictionary ■ support for Swype style text input ■ detected a security issue in 2017; the app was sending user information back to China (language, location, network type, ...), more than 200 million users affected ■ videOihttp : / /www. youtube . com/watch?v=XQRRvSwpmWc Pavel Rychlý • Dasher - Character LM • March 12, 2024 32/33 Other options ■ Perfect keyboard ■ Touch Pal keyboard ■ Google keyboard ■ Siine Shortcut keyboard ■ Adaptxt keyboard ■ ShapeWriter keyboard Pavel Rychlý • Dasher - Character LM • March 12, 2024 33/33