D 2023

Towards Perfection of Machine Learning of Competing Patterns: The Use Case of Czechoslovak Patterns Development

SOJKA, Ondřej and Petr SOJKA

Basic information

Original name

Towards Perfection of Machine Learning of Competing Patterns: The Use Case of Czechoslovak Patterns Development

Authors

SOJKA, Ondřej (203 Czech Republic, guarantor, belonging to the institution) and Petr SOJKA (203 Czech Republic, belonging to the institution)

Edition

Recent Advances in Slavonic. Brno, Recent Advances in Slavonic Natural Language Processing (RASLAN 2023), p. 113-120, 8 pp. 2023

Publisher

Tribun EU

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

References:

RIV identification code

RIV/00216224:14330/23:00132397

Organization unit

Faculty of Informatics

ISBN

978-80-263-1793-7

ISSN

Keywords in English

dictionary problem; effectiveness; hyphenation patterns; patgen; syllabification; Czech; Slovak; Czechoslovak patterns; machine learning

Tags

International impact
Změněno: 7/4/2024 23:37, RNDr. Pavel Šmerk, Ph.D.

Abstract

V originále

Finding space- and time-effective even perfect solution to the dictionary problem is an important practical and research problem, which solving may lead to a breakthrough in computation. Competing pattern technology from TeX is a special case, where for a given dictionary a word segmentation is stored in the competing patterns yet with very good generalization quality. Recently, the unreasonable effectiveness of pattern generation has been shown---it is possible to use hyphenation patterns to solve the dictionary problem jointly even for several languages without compromise.

In this article, we study the effectiveness of patgen for the supervised machine learning of the generation of the Czechoslovak hyphenation patterns. We show the machine learning techniques to develop competing patterns that are close to being perfect. We evaluate the new approach by improvements and space savings we gained during the development and finetuning of Czechoslovak hyphenation patterns.


Links

LM2023062, research and development project
Name: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy
Investor: Ministry of Education, Youth and Sports of the CR