SOJKA, Petr and Ondřej SOJKA. New Czechoslovak Hyphenation Patterns, Word Lists, and Workflow. TUGboat: The Communications of the TeX Users Group. San Francisco, USA: TUG, 2021, vol. 42, No 2, p. 152-158. ISSN 0896-3207. Available from: https://dx.doi.org/10.47397/tb/42-2/tb131sojka-czech.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name New Czechoslovak Hyphenation Patterns, Word Lists, and Workflow
Authors SOJKA, Petr (203 Czech Republic, guarantor, belonging to the institution) and Ondřej SOJKA (203 Czech Republic, belonging to the institution).
Edition TUGboat: The Communications of the TeX Users Group, San Francisco, USA, TUG, 2021, 0896-3207.
Other information
Original language English
Type of outcome Article in a journal
Field of Study 20206 Computer hardware and architecture
Country of publisher United States of America
Confidentiality degree is not subject to a state or trade secret
WWW preprint DOI conference program github repozitář presentation slides
RIV identification code RIV/00216224:14330/21:00122189
Organization unit Faculty of Informatics
Doi http://dx.doi.org/10.47397/tb/42-2/tb131sojka-czech
Keywords (in Czech) dělení slov; generování vzorů; databáze slov; vícejazyčná sazba; slabičné algoritmy; patgen; soutěživé vzory
Keywords in English hyphenation; pattern generation; word list database; multilingual typesetting; syllabification algorithms; patgen; competing patterns
Tags International impact, Reviewed
Changed by Changed by: doc. RNDr. Petr Sojka, Ph.D., učo 2378. Changed: 5/9/2023 11:40.
Abstract
Space- and time-effective segmentation and hyphenation of natural languages stay at the core of every document preparation system, web browser, or mobile rendering system. We use the unreasonable effectiveness of pattern generation with patgen. It is possible to use hyphenation patterns to solve the dictionary problem also for close languages without compromise. In this article, we show how we applied the marvelous effectiveness of patgen for the generation of the new Czechoslovak hyphenation patterns that cover both Czech and Slovak languages. We show that developing universal, up-to-date, high-coverage and high-generalization hyphenation patterns is feasible, generated from semi-automatically prepared word lists from actual language usage. We evaluate the new approach and argue that the new Czechoslovak hyphenation patterns bring significant coverage and generalization improvements, and space savings. We share all the data, word lists, and workflow for reproducibility and usage.
Links
MUNI/A/1573/2020, interní kód MUName: Aplikovaný výzkum: vyhledávání, analýza a vizualizace rozsáhlých dat, zpracování přirozeného jazyka, umělá inteligence pro analýzu biomedicínských obrazů.
Investor: Masaryk University
Type Name Uploaded/Created by Uploaded/Created Rights
New_Czechoslovak_Hyphenation_Patterns__Word_Lists__and_Workflow__TUG_2021__1_.pdf Licence Creative Commons  File version Sojka, P. 30/8/2021

Properties

Address within IS
https://is.muni.cz/auth/publication/1788557/New_Czechoslovak_Hyphenation_Patterns__Word_Lists__and_Workflow__TUG_2021__1_.pdf
Address for the users outside IS
https://is.muni.cz/publication/1788557/New_Czechoslovak_Hyphenation_Patterns__Word_Lists__and_Workflow__TUG_2021__1_.pdf
Address within Manager
https://is.muni.cz/auth/publication/1788557/New_Czechoslovak_Hyphenation_Patterns__Word_Lists__and_Workflow__TUG_2021__1_.pdf?info
Address within Manager for the users outside IS
https://is.muni.cz/publication/1788557/New_Czechoslovak_Hyphenation_Patterns__Word_Lists__and_Workflow__TUG_2021__1_.pdf?info
Uploaded/Created
Mon 30/8/2021 11:54, doc. RNDr. Petr Sojka, Ph.D.

Rights

Right to read
  • anyone on the Internet
  • a concrete person doc. RNDr. Petr Sojka, Ph.D., učo 2378
  • a concrete person Ondřej Sojka, učo 454904
Right to upload
 
Right to administer:
  • a concrete person doc. RNDr. Petr Sojka, Ph.D., učo 2378
  • a concrete person Ondřej Sojka, učo 454904
Attributes
 

New_Czechoslovak_Hyphenation_Patterns__Word_Lists__and_Workflow__TUG_2021__1_.pdf

Application
Open the file
Download file.
Address within IS
https://is.muni.cz/auth/publication/1788557/New_Czechoslovak_Hyphenation_Patterns__Word_Lists__and_Workflow__TUG_2021__1_.pdf
Address for the users outside IS
https://is.muni.cz/publication/1788557/New_Czechoslovak_Hyphenation_Patterns__Word_Lists__and_Workflow__TUG_2021__1_.pdf
File type
PDF (application/pdf)
Size
576,1 KB
Hash md5
2dbe453cc4d11fbe1fd6bbaa9c0de06e
Uploaded/Created
Mon 30/8/2021 11:54

New_Czechoslovak_Hyphenation_Patterns__Word_Lists__and_Workflow__TUG_2021__1__Archive.pdf

Application
Open the file
Download file.
Address within IS
https://is.muni.cz/auth/publication/1788557/New_Czechoslovak_Hyphenation_Patterns__Word_Lists__and_Workflow__TUG_2021__1__Archive.pdf
Address for the users outside IS
https://is.muni.cz/publication/1788557/New_Czechoslovak_Hyphenation_Patterns__Word_Lists__and_Workflow__TUG_2021__1__Archive.pdf
File type
PDF/A (application/x-pdf)
Size
3,2 MB
Hash md5
2ba5b290d7735689c89020be445c9cbd
Uploaded/Created
Mon 30/8/2021 12:11

New_Czechoslovak_Hyphenation_Patterns__Word_Lists__and_Workflow__TUG_2021__1_.txt

Application
Open the file
Download file.
Address within IS
https://is.muni.cz/auth/publication/1788557/New_Czechoslovak_Hyphenation_Patterns__Word_Lists__and_Workflow__TUG_2021__1_.txt
Address for the users outside IS
https://is.muni.cz/publication/1788557/New_Czechoslovak_Hyphenation_Patterns__Word_Lists__and_Workflow__TUG_2021__1_.txt
File type
plain text (text/plain)
Size
26,5 KB
Hash md5
411b4e355bc684c10f752557a9799757
Uploaded/Created
Mon 30/8/2021 12:14
Print
Report a file uploaded without authorization. Displayed: 8/7/2024 04:59