D 2019

Towards Universal Hyphenation Patterns

SOJKA, Petr and Ondřej SOJKA

Basic information

Original name

Towards Universal Hyphenation Patterns

Authors

SOJKA, Petr (203 Czech Republic, guarantor, belonging to the institution) and Ondřej SOJKA (203 Czech Republic, belonging to the institution)

Edition

Brno, Proceedings of the Thirteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2019, p. 63-68, 6 pp. 2019

Publisher

Tribun EU

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

References:

full paper workshop homepage conference slides

RIV identification code

RIV/00216224:14330/19:00111503

Organization unit

Faculty of Informatics

ISBN

978-80-263-1517-9

ISSN

UT WoS

000604899800008

Keywords (in Czech)

dělení slov; vzory dělení; patgen; dělení na slabiky; Unicode; TeX; slabičné dělení; čeština; slovenština

Keywords in English

hyphenation; hyphenation patterns; patgen; syllabification; Unicode; TeX; syllabic hyphenation; Czech; Slovak

Tags

International impact, Reviewed
Změněno: 15/5/2024 01:38, RNDr. Pavel Šmerk, Ph.D.

Abstract

V originále

Hyphenation is at the core of every document preparation system, being that typesetting system such as TeX or modern web browser. For every language, there have to be algorithms, rules, or patterns hyphenating according to that. We are proposing the development of generic hyphenation patterns for a set of languages sharing the same principles, e.g., for all syllable-based languages. We have tested this idea by the development of Czechoslovak hyphenation patterns. At the minimal price of a tiny increase in the size of hyphenation patterns, we have shown that further development of universal syllabic hyphenation patterns is feasible.

Links

MUNI/A/1145/2018, interní kód MU
Name: Aplikovaný výzkum na FI: softwarové architektury kritických infrastruktur, bezpečnost počítačových systémů, techniky pro zpracování a vizualizaci velkých dat a rozšířená realita.
Investor: Masaryk University, Critical Infrastructure Software Architectures, Computer Systems Security, Data Processing and Visualization Techniques, and Augmented Reality, Category A
Displayed: 2/11/2024 04:29