Automating dictionary production: a Tagalog-English-Korean
dictionary from scratch

D 2019

Automating dictionary production: a Tagalog-English-Korean dictionary from scratch

BAISA, Vít, Marek BLAHUŠ, Michal CUKR, Ondřej HERMAN, Miloš JAKUBÍČEK et. al.

Basic information

Original name

Automating dictionary production: a Tagalog-English-Korean dictionary from scratch

Authors

BAISA, Vít (203 Czech Republic, belonging to the institution), Marek BLAHUŠ (203 Czech Republic), Michal CUKR (203 Czech Republic), Ondřej HERMAN (203 Czech Republic, belonging to the institution), Miloš JAKUBÍČEK (203 Czech Republic, belonging to the institution), Vojtěch KOVÁŘ (203 Czech Republic, belonging to the institution), Marek MEDVEĎ (703 Slovakia, belonging to the institution), Michal MĚCHURA (203 Czech Republic, belonging to the institution), Pavel RYCHLÝ (203 Czech Republic, belonging to the institution) and Vít SUCHOMEL (203 Czech Republic, belonging to the institution)

Edition

Brno, Czech Republic, Proceedings of the 6th Biennial Conference on Electronic Lexicography, p. 805-818, 14 pp. 2019

Publisher

Lexical Computing CZ s.r.o.

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

electronic version available online

References:

Konferenční sborník

RIV identification code

RIV/00216224:14330/19:00107599

Organization unit

Faculty of Informatics

ISSN

Keywords in English

Sketch Engine; Lexonomy; post-editing lexicography; dictionary; corpus; Tagalog; Filipino; English; Korean

Abstract

V originále

In this paper we present lexicographic work on a Tagalog-English-Korean dictionary. The dictionary is created entirely from scratch and all of its content (besides audio pronunciation) is initially generated fully automatically from a large web corpus that we built for these purposes, and then post-edited by human editors. The full size of the dictionary is 45,000 entries, out of which 15,000 most frequent entries are manually post-edited, while the remaining 30,000 entries are left only as automated. The project is currently ongoing and will be finished in December 2019. The dictionary will be part of the online platform run by the Naver Corporation and freely available.

Links

GA18-23891S, research and development project

Name: Hyperintensionální usuzování nad texty přirozeného jazyka

Investor: Czech Science Foundation

LM2015071, research and development project

Name: Jazyková výzkumná infrastruktura v České republice (Acronym: LINDAT-Clarin)

Investor: Ministry of Education, Youth and Sports of the CR

Citovat

BAISA, Vít, Marek BLAHUŠ, Michal CUKR, Ondřej HERMAN, Miloš JAKUBÍČEK, Vojtěch KOVÁŘ, Marek MEDVEĎ, Michal MĚCHURA, Pavel RYCHLÝ and Vít SUCHOMEL. Automating dictionary production: a Tagalog-English-Korean dictionary from scratch. Online. In Proceedings of the 6th Biennial Conference on Electronic Lexicography. Brno, Czech Republic: Lexical Computing CZ s.r.o., 2019, p. 805-818. ISSN 2533-5626.

@inproceedings{1550657,
   author = {Baisa, Vít and Blahuš, Marek and Cukr, Michal and Herman, Ondřej and Jakubíček, Miloš and Kovář, Vojtěch and Medveď, Marek and Měchura, Michal and Rychlý, Pavel and Suchomel, Vít},
   address = {Brno, Czech Republic},
   booktitle = {Proceedings of the 6th Biennial Conference on Electronic Lexicography},
   keywords = {Sketch Engine; Lexonomy; post-editing lexicography; dictionary; corpus; Tagalog; Filipino; English; Korean},
   howpublished = {elektronická verze "online"},
   language = {eng},
   location = {Brno, Czech Republic},
   pages = {805-818},
   publisher = {Lexical Computing CZ s.r.o.},
   title = {Automating dictionary production: a Tagalog-English-Korean dictionary from scratch},
   url = {https://elex.link/elex2019/wp-content/uploads/2019/10/eLex-2019_Proceedings.pdf},
   year = {2019}
}

TY  - JOUR
ID  - 1550657
AU  - Baisa, Vít - Blahuš, Marek - Cukr, Michal - Herman, Ondřej - Jakubíček, Miloš - Kovář, Vojtěch - Medveď, Marek - Měchura, Michal - Rychlý, Pavel - Suchomel, Vít
PY  - 2019
TI  - Automating dictionary production: a Tagalog-English-Korean dictionary from scratch
PB  - Lexical Computing CZ s.r.o.
CY  - Brno, Czech Republic
KW  - Sketch Engine
KW  - Lexonomy
KW  - post-editing lexicography
KW  - dictionary
KW  - corpus
KW  - Tagalog
KW  - Filipino
KW  - English
KW  - Korean
UR  - https://elex.link/elex2019/wp-content/uploads/2019/10/eLex-2019_Proceedings.pdf
N2  - In this paper we present lexicographic work on a Tagalog-English-Korean dictionary. The dictionary is created entirely from scratch and all of its content (besides audio pronunciation) is initially generated fully automatically from a large web corpus that we built for these purposes, and then post-edited by human editors. The full size of the dictionary is 45,000 entries, out of which 15,000 most frequent entries are manually post-edited, while the remaining 30,000 entries are left only as automated. The project is currently ongoing and will be finished in December 2019. The dictionary will be part of the online platform run by the Naver Corporation and freely available.
ER  -

BAISA, Vít, Marek BLAHUŠ, Michal CUKR, Ondřej HERMAN, Miloš JAKUBÍČEK, Vojtěch KOVÁŘ, Marek MEDVEĎ, Michal MĚCHURA, Pavel RYCHLÝ and Vít SUCHOMEL. Automating dictionary production: a Tagalog-English-Korean dictionary from scratch. Online. In \textit{Proceedings of the 6th Biennial Conference on Electronic Lexicography}. Brno, Czech Republic: Lexical Computing CZ s.r.o., 2019, p.~805-818. ISSN~2533-5626.

Detailed Information on Publication Record