BAISA, Vít, Marek BLAHUŠ, Michal CUKR, Ondřej HERMAN, Miloš JAKUBÍČEK, Vojtěch KOVÁŘ, Marek MEDVEĎ, Michal MĚCHURA, Pavel RYCHLÝ and Vít SUCHOMEL. Automating dictionary production: a Tagalog-English-Korean dictionary from scratch. Online. In Proceedings of the 6th Biennial Conference on Electronic Lexicography. Brno, Czech Republic: Lexical Computing CZ s.r.o., 2019, p. 805-818. ISSN 2533-5626.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Automating dictionary production: a Tagalog-English-Korean dictionary from scratch
Authors BAISA, Vít (203 Czech Republic, belonging to the institution), Marek BLAHUŠ (203 Czech Republic), Michal CUKR (203 Czech Republic), Ondřej HERMAN (203 Czech Republic, belonging to the institution), Miloš JAKUBÍČEK (203 Czech Republic, belonging to the institution), Vojtěch KOVÁŘ (203 Czech Republic, belonging to the institution), Marek MEDVEĎ (703 Slovakia, belonging to the institution), Michal MĚCHURA (203 Czech Republic, belonging to the institution), Pavel RYCHLÝ (203 Czech Republic, belonging to the institution) and Vít SUCHOMEL (203 Czech Republic, belonging to the institution).
Edition Brno, Czech Republic, Proceedings of the 6th Biennial Conference on Electronic Lexicography, p. 805-818, 14 pp. 2019.
Publisher Lexical Computing CZ s.r.o.
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Czech Republic
Confidentiality degree is not subject to a state or trade secret
Publication form electronic version available online
WWW Konferenční sborník
RIV identification code RIV/00216224:14330/19:00107599
Organization unit Faculty of Informatics
ISSN 2533-5626
Keywords in English Sketch Engine; Lexonomy; post-editing lexicography; dictionary; corpus; Tagalog; Filipino; English; Korean
Tags International impact, Reviewed
Changed by Changed by: RNDr. Miloš Jakubíček, Ph.D., učo 172962. Changed: 22/10/2023 01:49.
Abstract
In this paper we present lexicographic work on a Tagalog-English-Korean dictionary. The dictionary is created entirely from scratch and all of its content (besides audio pronunciation) is initially generated fully automatically from a large web corpus that we built for these purposes, and then post-edited by human editors. The full size of the dictionary is 45,000 entries, out of which 15,000 most frequent entries are manually post-edited, while the remaining 30,000 entries are left only as automated. The project is currently ongoing and will be finished in December 2019. The dictionary will be part of the online platform run by the Naver Corporation and freely available.
Links
GA18-23891S, research and development projectName: Hyperintensionální usuzování nad texty přirozeného jazyka
Investor: Czech Science Foundation
LM2015071, research and development projectName: Jazyková výzkumná infrastruktura v České republice (Acronym: LINDAT-Clarin)
Investor: Ministry of Education, Youth and Sports of the CR
PrintDisplayed: 30/8/2024 16:13