SOJKA, Petr and Ondřej SOJKA. Towards Universal Hyphenation Patterns. In Aleš Horák, Pavel Rychlý, Adam Rambousek. Proceedings of the Thirteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2019. Brno: Tribun EU, 2019, p. 63-68. ISBN 978-80-263-1517-9.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Towards Universal Hyphenation Patterns
Authors SOJKA, Petr (203 Czech Republic, guarantor, belonging to the institution) and Ondřej SOJKA (203 Czech Republic, belonging to the institution).
Edition Brno, Proceedings of the Thirteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2019, p. 63-68, 6 pp. 2019.
Publisher Tribun EU
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Czech Republic
Confidentiality degree is not subject to a state or trade secret
Publication form printed version "print"
WWW full paper workshop homepage conference slides
RIV identification code RIV/00216224:14330/19:00111503
Organization unit Faculty of Informatics
ISBN 978-80-263-1517-9
ISSN 2336-4289
UT WoS 000604899800008
Keywords (in Czech) dělení slov; vzory dělení; patgen; dělení na slabiky; Unicode; TeX; slabičné dělení; čeština; slovenština
Keywords in English hyphenation; hyphenation patterns; patgen; syllabification; Unicode; TeX; syllabic hyphenation; Czech; Slovak
Tags International impact, Reviewed
Changed by Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 15/5/2024 01:38.
Abstract
Hyphenation is at the core of every document preparation system, being that typesetting system such as TeX or modern web browser. For every language, there have to be algorithms, rules, or patterns hyphenating according to that. We are proposing the development of generic hyphenation patterns for a set of languages sharing the same principles, e.g., for all syllable-based languages. We have tested this idea by the development of Czechoslovak hyphenation patterns. At the minimal price of a tiny increase in the size of hyphenation patterns, we have shown that further development of universal syllabic hyphenation patterns is feasible.
Links
MUNI/A/1145/2018, interní kód MUName: Aplikovaný výzkum na FI: softwarové architektury kritických infrastruktur, bezpečnost počítačových systémů, techniky pro zpracování a vizualizaci velkých dat a rozšířená realita.
Investor: Masaryk University, Critical Infrastructure Software Architectures, Computer Systems Security, Data Processing and Visualization Techniques, and Augmented Reality, Category A
PrintDisplayed: 22/8/2024 14:07