A System for Predictive Writing

NEVĚŘILOVÁ, Zuzana and Barbora ULIPOVÁ. A System for Predictive Writing. In Eighth Workshop on Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2014, p. 11-18. ISSN 2336-4289.

Other formats: BibTeX LaTeX RIS

Basic information
Original name	A System for Predictive Writing
Authors	NEVĚŘILOVÁ, Zuzana (203 Czech Republic, guarantor, belonging to the institution) and Barbora ULIPOVÁ (203 Czech Republic, belonging to the institution).
Edition	Brno, Eighth Workshop on Recent Advances in Slavonic Natural Language Processing, p. 11-18, 8 pp. 2014.
Publisher	Tribun EU

Other information
Original language	English
Type of outcome	Proceedings paper
Field of Study	60200 6.2 Languages and Literature
Country of publisher	Czech Republic
Confidentiality degree	is not subject to a state or trade secret
Publication form	printed version "print"
WWW	URL
RIV identification code	RIV/00216224:14330/14:00077507
Organization unit	Faculty of Informatics
ISSN	2336-4289
UT WoS	000374560500002
Keywords in English	predictive writing; n-gram language model; corpus; KSPC
Tags	International impact, Reviewed
Changed by	Changed by: RNDr. Zuzana Nevěřilová, Ph.D., učo 3839. Changed: 27/5/2021 09:09.

Abstract

Most predictive writing systems are based on n-gram model with different size. Systems designed for English are easier than those for flective languages since even smaller models allow reasonable coverage. However, the same corpus size is significantly insufficient for languages with many word forms. The paper presents a new predictive writing system based on n-grams calculated from a large corpus. We designed the high-performance server-side script that returns either the most probable endings of a word or the most probable following words. We also designed the client-side script that is suitable for desktop computers without touchscreens. We calculated 150 millions most frequent n-grams for n = 1, . . . , 12 from a Czech corpus and evaluated the writing system on Czech texts. The system was then extended by custom-built model that can consist of domain or user specific n-grams. We measured the key stroke per character (KSPC) rate in two different modes: one – called letter KSPC – excludes the control keys since they are input method specific, the other – called real KSPC – includes all key strokes. We have shown that the system performs well in general (letter KSPC on average was 0.64, real KSPC on average was 0.77) but performs even better on specific domains with the appropriate custom-built model (letter KSPC and real KSPC were on average 0.63 and 0.73 respectively). The system was tested on Czech, however it can easily be adapted an arbitrary language. Due to its performance, the system is suitable for languages with high inflection.

Links
MUNI/A/0792/2013, interní kód MU	Name: Čeština v jednotě synchronie a diachronie - 2014
MUNI/A/0792/2013, interní kód MU	Investor: Masaryk University, Category A

PrintDisplayed: 19/7/2024 22:24

A System for Predictive Writing

Other applications