Automating dictionary production: a Tagalog-English-Korean
dictionary from scratch

D 2019

Automating dictionary production: a Tagalog-English-Korean dictionary from scratch

BAISA, Vít; Marek BLAHUŠ; Michal CUKR; Ondřej HERMAN; Miloš JAKUBÍČEK et al.

Základní údaje

Originální název

Automating dictionary production: a Tagalog-English-Korean dictionary from scratch

Autoři

BAISA, Vít; Marek BLAHUŠ; Michal CUKR; Ondřej HERMAN; Miloš JAKUBÍČEK; Vojtěch KOVÁŘ ; Marek MEDVEĎ; Michal MĚCHURA; Pavel RYCHLÝ a Vít SUCHOMEL

Vydání

Brno, Czech Republic, Proceedings of the 6th Biennial Conference on Electronic Lexicography, od s. 805-818, 14 s. 2019

Nakladatel

Lexical Computing CZ s.r.o.

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10201 Computer sciences, information science, bioinformatics

Stát vydavatele

Česká republika

Utajení

není předmětem státního či obchodního tajemství

Forma vydání

elektronická verze "online"

Odkazy

Konferenční sborník

Označené pro přenos do RIV

Ano

Kód RIV

RIV/00216224:14330/19:00107599

Organizační jednotka

Fakulta informatiky

ISSN

EID Scopus

2-s2.0-85075389906

Klíčová slova anglicky

Sketch Engine; Lexonomy; post-editing lexicography; dictionary; corpus; Tagalog; Filipino; English; Korean

Příznaky

Mezinárodní význam, Recenzováno

Změněno: 22. 10. 2023 01:49, RNDr. Miloš Jakubíček, Ph.D.

Anotace

V originále

In this paper we present lexicographic work on a Tagalog-English-Korean dictionary. The dictionary is created entirely from scratch and all of its content (besides audio pronunciation) is initially generated fully automatically from a large web corpus that we built for these purposes, and then post-edited by human editors. The full size of the dictionary is 45,000 entries, out of which 15,000 most frequent entries are manually post-edited, while the remaining 30,000 entries are left only as automated. The project is currently ongoing and will be finished in December 2019. The dictionary will be part of the online platform run by the Naver Corporation and freely available.

Návaznosti

GA18-23891S, projekt VaV

Název: Hyperintensionální usuzování nad texty přirozeného jazyka

Investor: Grantová agentura ČR, Hyperintensionální usuzování nad texty přirozeného jazyka

LM2015071, projekt VaV

Název: Jazyková výzkumná infrastruktura v České republice (Akronym: LINDAT-Clarin)

Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Projekt LINDAT-Clarin - Vybudování a provoz českého uzlu pan-evropské infrastruktury pro výzkum

Citovat

BAISA, Vít; Marek BLAHUŠ; Michal CUKR; Ondřej HERMAN; Miloš JAKUBÍČEK; Vojtěch KOVÁŘ; Marek MEDVEĎ; Michal MĚCHURA; Pavel RYCHLÝ a Vít SUCHOMEL. Automating dictionary production: a Tagalog-English-Korean dictionary from scratch. Online. In Proceedings of the 6th Biennial Conference on Electronic Lexicography. Brno, Czech Republic: Lexical Computing CZ s.r.o., 2019, s. 805-818. ISSN 2533-5626.

@inproceedings{1550657,
   author = {Baisa, Vít and Blahuš, Marek and Cukr, Michal and Herman, Ondřej and Jakubíček, Miloš and Kovář, Vojtěch and Medveď, Marek and Měchura, Michal and Rychlý, Pavel and Suchomel, Vít},
   address = {Brno, Czech Republic},
   booktitle = {Proceedings of the 6th Biennial Conference on Electronic Lexicography},
   keywords = {Sketch Engine; Lexonomy; post-editing lexicography; dictionary; corpus; Tagalog; Filipino; English; Korean},
   howpublished = {elektronická verze "online"},
   language = {eng},
   location = {Brno, Czech Republic},
   pages = {805-818},
   publisher = {Lexical Computing CZ s.r.o.},
   title = {Automating dictionary production: a Tagalog-English-Korean dictionary from scratch},
   url = {https://elex.link/elex2019/wp-content/uploads/2019/10/eLex-2019_Proceedings.pdf},
   year = {2019}
}

TY  - CONF
ID  - 1550657
AU  - Baisa, Vít - Blahuš, Marek - Cukr, Michal - Herman, Ondřej - Jakubíček, Miloš - Kovář, Vojtěch - Medveď, Marek - Měchura, Michal - Rychlý, Pavel - Suchomel, Vít
PY  - 2019
TI  - Automating dictionary production: a Tagalog-English-Korean dictionary from scratch
PB  - Lexical Computing CZ s.r.o.
CY  - Brno, Czech Republic
KW  - Sketch Engine
KW  - Lexonomy
KW  - post-editing lexicography
KW  - dictionary
KW  - corpus
KW  - Tagalog
KW  - Filipino
KW  - English
KW  - Korean
UR  - https://elex.link/elex2019/wp-content/uploads/2019/10/eLex-2019_Proceedings.pdf
N2  - In this paper we present lexicographic work on a Tagalog-English-Korean dictionary. The dictionary is created entirely from scratch and all of its content (besides audio pronunciation) is initially generated fully automatically from a large web corpus that we built for these purposes, and then post-edited by human editors. The full size of the dictionary is 45,000 entries, out of which 15,000 most frequent entries are manually post-edited, while the remaining 30,000 entries are left only as automated. The project is currently ongoing and will be finished in December 2019. The dictionary will be part of the online platform run by the Naver Corporation and freely available.
ER  -

BAISA, Vít; Marek BLAHUŠ; Michal CUKR; Ondřej HERMAN; Miloš JAKUBÍČEK; Vojtěch KOVÁŘ; Marek MEDVEĎ; Michal MĚCHURA; Pavel RYCHLÝ a Vít SUCHOMEL. Automating dictionary production: a Tagalog-English-Korean dictionary from scratch. Online. In \textit{Proceedings of the 6th Biennial Conference on Electronic Lexicography}. Brno, Czech Republic: Lexical Computing CZ s.r.o., 2019, s.~805-818. ISSN~2533-5626.

Přehled o publikaci