D 2019

Automating dictionary production: a Tagalog-English-Korean dictionary from scratch

BAISA, Vít, Marek BLAHUŠ, Michal CUKR, Ondřej HERMAN, Miloš JAKUBÍČEK et. al.

Basic information

Original name

Automating dictionary production: a Tagalog-English-Korean dictionary from scratch

Authors

BAISA, Vít (203 Czech Republic, belonging to the institution), Marek BLAHUŠ (203 Czech Republic), Michal CUKR (203 Czech Republic), Ondřej HERMAN (203 Czech Republic, belonging to the institution), Miloš JAKUBÍČEK (203 Czech Republic, belonging to the institution), Vojtěch KOVÁŘ (203 Czech Republic, belonging to the institution), Marek MEDVEĎ (703 Slovakia, belonging to the institution), Michal MĚCHURA (203 Czech Republic, belonging to the institution), Pavel RYCHLÝ (203 Czech Republic, belonging to the institution) and Vít SUCHOMEL (203 Czech Republic, belonging to the institution)

Edition

Brno, Czech Republic, Proceedings of the 6th Biennial Conference on Electronic Lexicography, p. 805-818, 14 pp. 2019

Publisher

Lexical Computing CZ s.r.o.

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

electronic version available online

RIV identification code

RIV/00216224:14330/19:00107599

Organization unit

Faculty of Informatics

ISSN

Keywords in English

Sketch Engine; Lexonomy; post-editing lexicography; dictionary; corpus; Tagalog; Filipino; English; Korean

Tags

International impact, Reviewed
Změněno: 22/10/2023 01:49, RNDr. Miloš Jakubíček, Ph.D.

Abstract

V originále

In this paper we present lexicographic work on a Tagalog-English-Korean dictionary. The dictionary is created entirely from scratch and all of its content (besides audio pronunciation) is initially generated fully automatically from a large web corpus that we built for these purposes, and then post-edited by human editors. The full size of the dictionary is 45,000 entries, out of which 15,000 most frequent entries are manually post-edited, while the remaining 30,000 entries are left only as automated. The project is currently ongoing and will be finished in December 2019. The dictionary will be part of the online platform run by the Naver Corporation and freely available.

Links

GA18-23891S, research and development project
Name: Hyperintensionální usuzování nad texty přirozeného jazyka
Investor: Czech Science Foundation
LM2015071, research and development project
Name: Jazyková výzkumná infrastruktura v České republice (Acronym: LINDAT-Clarin)
Investor: Ministry of Education, Youth and Sports of the CR