Evaluation of the Cross-lingual Embedding Models from the
Lexicographic Perspective

DENISOVÁ, Michaela a Pavel RYCHLÝ. Evaluation of the Cross-lingual Embedding Models from the Lexicographic Perspective. In Electronic lexicography in the 21st century (eLex 2023): Invisible Lexicography. Proceedings of the eLex 2023 conference. Brno: Lexical Computing CZ s.r.o., 2023, s. 1-18. ISSN 2533-5626.

Další formáty: BibTeX LaTeX RIS

Základní údaje
Originální název	Evaluation of the Cross-lingual Embedding Models from the Lexicographic Perspective
Autoři	DENISOVÁ, Michaela (703 Slovensko, domácí) a Pavel RYCHLÝ (203 Česká republika, domácí).
Vydání	Brno, Electronic lexicography in the 21st century (eLex 2023): Invisible Lexicography. Proceedings of the eLex 2023 conference, od s. 1-18, 18 s. 2023.
Nakladatel	Lexical Computing CZ s.r.o.

Další údaje
Originální jazyk	angličtina
Typ výsledku	Stať ve sborníku
Obor	10201 Computer sciences, information science, bioinformatics
Stát vydavatele	Česká republika
Utajení	není předmětem státního či obchodního tajemství
Forma vydání	tištěná verze "print"
WWW	Plný text
Kód RIV	RIV/00216224:14330/23:00131141
Organizační jednotka	Fakulta informatiky
ISSN	2533-5626
Klíčová slova anglicky	cross-lingual embedding models; bilingual lexicon induction task; retrieving translation equivalents; evaluation
Změnil	Změnil: RNDr. Pavel Šmerk, Ph.D., učo 3880. Změněno: 8. 4. 2024 14:11.

Anotace

Cross-lingual embedding models (CMs) enable us to transfer lexical knowledge across languages. Therefore, they represent a useful approach for retrieving translation equivalents in lexicography. However, these models have been mainly oriented towards the natural language processing (NLP) field, lacking proper evaluation with error evaluation datasets that were compiled automatically. This causes discrepancies between models hindering the correct interpretation of the results. In this paper, we aim to address these issues and make these models more accessible for lexicography by evaluating them from a lexicographic point of view. We evaluate three benchmark CMs on three diverse language pairs: close, distant, and different script languages. Additionally, we propose key parameters that the evaluation dataset should include to meet lexicographic needs, have reproducible results, accurately reflect the performance, and set appropriate parameters during training. Our code and evaluation datasets are publicly available.

Návaznosti
EF19_073/0016943, projekt VaV	Název: Interní grantová agentura Masarykovy univerzity
MUNI/IGA/1285/2021, interní kód MU	Název: Finding translation equivalents without parallel texts
MUNI/IGA/1285/2021, interní kód MU	Investor: Masarykova univerzita, Finding translation equivalents without parallel texts

VytisknoutZobrazeno: 3. 10. 2024 10:48

Evaluation of the Cross-lingual Embedding Models from the Lexicographic Perspective

Další aplikace