A Roadmap for Universal Syllabic Segmentation
SOJKA, Ondřej, Petr SOJKA and Jakub MÁCA. A Roadmap for Universal Syllabic Segmentation. Zpravodaj CSTUG. Brno: CSTUG, 2023, vol. 33, 3-4, p. 125-138. ISSN 1211-6661. Available from: https://dx.doi.org/10.5300/2023-3-4/125. |
Other formats:
BibTeX
LaTeX
RIS
|
Basic information | |
---|---|
Original name | A Roadmap for Universal Syllabic Segmentation |
Name (in English) | A Roadmap for Universal Syllabic Segmentation |
Authors | SOJKA, Ondřej (203 Czech Republic, guarantor, belonging to the institution), Petr SOJKA (203 Czech Republic, belonging to the institution) and Jakub MÁCA (203 Czech Republic, belonging to the institution). |
Edition | Zpravodaj CSTUG, Brno, CSTUG, 2023, 1211-6661. |
Other information | |
---|---|
Original language | Czech |
Type of outcome | Article in a journal |
Field of Study | 10201 Computer sciences, information science, bioinformatics |
Country of publisher | Czech Republic |
Confidentiality degree | is not subject to a state or trade secret |
WWW | DOI |
RIV identification code | RIV/00216224:14330/23:00132504 |
Organization unit | Faculty of Informatics |
Doi | http://dx.doi.org/10.5300/2023-3-4/125 |
Keywords (in Czech) | slabičnost; slabika; dělení slov; příprava univerzálních vzorů |
Keywords in English | syllabification; hyphenation; universal syllabic patterns preparation |
Tags | Reviewed |
Changed by | Changed by: doc. RNDr. Petr Sojka, Ph.D., učo 2378. Changed: 12/12/2023 17:56. |
Abstract |
---|
Space- and time-effective segmentation (word hyphenation) of natural languages remains at the core of every document rendering system, be it TeX, web browser, or mobile operating system. In most languages, segmentation mimicking syllabic pronunciation is a pragmatic preference today. As language switching is often not marked in rendered texts, the typesetting engine needs universal syllabic segmentation. In this article, we show the feasibility of this idea by offering a prototype solution to two main problems: For A), we have applied it to generating universal syllabic patterns from wordlists of nine syllabic, as opposed to etymology-based, languages (namely, Czech, Slovak, Georgian, Greek, Polish, Russian, Turkish, Turkmen, and Ukrainian). With the data from these nine languages, we show that: |
Abstract (in English) |
---|
Space- and time-effective segmentation (word hyphenation) of natural languages remains at the core of every document rendering system, be it TeX, web browser, or mobile operating system. In most languages, segmentation mimicking syllabic pronunciation is a pragmatic preference today. As language switching is often not marked in rendered texts, the typesetting engine needs universal syllabic segmentation. In this article, we show the feasibility of this idea by offering a prototype solution to two main problems: For A), we have applied it to generating universal syllabic patterns from wordlists of nine syllabic, as opposed to etymology-based, languages (namely, Czech, Slovak, Georgian, Greek, Polish, Russian, Turkish, Turkmen, and Ukrainian). With the data from these nine languages, we show that: |
Links | |
---|---|
MUNI/A/1339/2022, interní kód MU | Name: Rozvoj technik pro zpracování dat pro podporu vyhledávání, analýz a vizualizací rozsáhlých datových souborů s využitím umělé inteligence |
Investor: Masaryk University, Development of data processing techniques to support search, analysis and visualization of large datasets using artificial intelligence |