A System for Predictive Writing

D 2014

A System for Predictive Writing

NEVĚŘILOVÁ, Zuzana a Barbora ULIPOVÁ

Základní údaje

Originální název

A System for Predictive Writing

Autoři

NEVĚŘILOVÁ, Zuzana (203 Česká republika, garant, domácí) a Barbora ULIPOVÁ (203 Česká republika, domácí)

Vydání

Brno, Eighth Workshop on Recent Advances in Slavonic Natural Language Processing, od s. 11-18, 8 s. 2014

Nakladatel

Tribun EU

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

60200 6.2 Languages and Literature

Stát vydavatele

Česká republika

Utajení

není předmětem státního či obchodního tajemství

Forma vydání

tištěná verze "print"

Odkazy

URL

Kód RIV

RIV/00216224:14330/14:00077507

Organizační jednotka

Fakulta informatiky

ISSN

UT WoS

000374560500002

Klíčová slova anglicky

predictive writing; n-gram language model; corpus; KSPC

Příznaky

Mezinárodní význam, Recenzováno

Změněno: 27. 5. 2021 09:09, RNDr. Zuzana Nevěřilová, Ph.D.

Anotace

V originále

Most predictive writing systems are based on n-gram model with different size. Systems designed for English are easier than those for flective languages since even smaller models allow reasonable coverage. However, the same corpus size is significantly insufficient for languages with many word forms. The paper presents a new predictive writing system based on n-grams calculated from a large corpus. We designed the high-performance server-side script that returns either the most probable endings of a word or the most probable following words. We also designed the client-side script that is suitable for desktop computers without touchscreens. We calculated 150 millions most frequent n-grams for n = 1, . . . , 12 from a Czech corpus and evaluated the writing system on Czech texts. The system was then extended by custom-built model that can consist of domain or user specific n-grams. We measured the key stroke per character (KSPC) rate in two different modes: one – called letter KSPC – excludes the control keys since they are input method specific, the other – called real KSPC – includes all key strokes. We have shown that the system performs well in general (letter KSPC on average was 0.64, real KSPC on average was 0.77) but performs even better on specific domains with the appropriate custom-built model (letter KSPC and real KSPC were on average 0.63 and 0.73 respectively). The system was tested on Czech, however it can easily be adapted an arbitrary language. Due to its performance, the system is suitable for languages with high inflection.

Návaznosti

MUNI/A/0792/2013, interní kód MU

Název: Čeština v jednotě synchronie a diachronie - 2014

Investor: Masarykova univerzita, Čeština v jednotě synchronie a diachronie - 2014, DO R. 2020_Kategorie A - Specifický výzkum - Studentské výzkumné projekty

Citovat

NEVĚŘILOVÁ, Zuzana a Barbora ULIPOVÁ. A System for Predictive Writing. In Eighth Workshop on Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2014, s. 11-18. ISSN 2336-4289.

@inproceedings{1210680,
   author = {Nevěřilová, Zuzana and Ulipová, Barbora},
   address = {Brno},
   booktitle = {Eighth Workshop on Recent Advances in Slavonic Natural Language Processing},
   keywords = {predictive writing; n-gram language model; corpus; KSPC},
   howpublished = {tištěná verze "print"},
   language = {eng},
   location = {Brno},
   pages = {11-18},
   publisher = {Tribun EU},
   title = {A System for Predictive Writing},
   url = {https://nlp.fi.muni.cz/raslan/2014/9.pdf},
   year = {2014}
}

TY  - JOUR
ID  - 1210680
AU  - Nevěřilová, Zuzana - Ulipová, Barbora
PY  - 2014
TI  - A System for Predictive Writing
PB  - Tribun EU
CY  - Brno
KW  - predictive writing
KW  - n-gram language model
KW  - corpus
KW  - KSPC
UR  - https://nlp.fi.muni.cz/raslan/2014/9.pdf
N2  - Most predictive writing systems are based on n-gram model with different size. Systems designed for English are easier than those for flective languages since even smaller models allow reasonable coverage. However, the same corpus size is significantly insufficient for languages with many word forms. The paper presents a new predictive writing system based on n-grams calculated from a large corpus. We designed the high-performance server-side script that returns either the most probable endings of a word or the most probable following words. We also designed the client-side script that is suitable for desktop computers without touchscreens. We calculated 150 millions most frequent n-grams for n = 1, . . . , 12 from a Czech corpus and evaluated the writing system on Czech texts. The system was then extended by custom-built model that can consist of domain or user specific n-grams. We measured the key stroke per character (KSPC) rate in two different modes: one – called letter KSPC – excludes the control keys since they are input method specific, the other – called real KSPC – includes all key strokes. We have shown that the system performs well in general (letter KSPC on average was 0.64, real KSPC on average was 0.77) but performs even better on specific domains with the appropriate custom-built model (letter KSPC and real KSPC were on average 0.63 and 0.73 respectively). The system was tested on Czech, however it can easily be adapted an arbitrary language. Due to its performance, the system is suitable for languages with high inflection.
ER  -

NEVĚŘILOVÁ, Zuzana a Barbora ULIPOVÁ. A System for Predictive Writing. In \textit{Eighth Workshop on Recent Advances in Slavonic Natural Language Processing}. Brno: Tribun EU, 2014, s.~11-18. ISSN~2336-4289.

Podrobný výpis o publikaci