D 2023

Evaluation of the Cross-lingual Embedding Models from the Lexicographic Perspective

DENISOVÁ, Michaela a Pavel RYCHLÝ

Základní údaje

Originální název

Evaluation of the Cross-lingual Embedding Models from the Lexicographic Perspective

Autoři

DENISOVÁ, Michaela (703 Slovensko, domácí) a Pavel RYCHLÝ (203 Česká republika, domácí)

Vydání

Brno, Electronic lexicography in the 21st century (eLex 2023): Invisible Lexicography. Proceedings of the eLex 2023 conference, od s. 1-18, 18 s. 2023

Nakladatel

Lexical Computing CZ s.r.o.

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10201 Computer sciences, information science, bioinformatics

Stát vydavatele

Česká republika

Utajení

není předmětem státního či obchodního tajemství

Forma vydání

tištěná verze "print"

Odkazy

Kód RIV

RIV/00216224:14330/23:00131141

Organizační jednotka

Fakulta informatiky

ISSN

Klíčová slova anglicky

cross-lingual embedding models; bilingual lexicon induction task; retrieving translation equivalents; evaluation
Změněno: 8. 4. 2024 14:11, RNDr. Pavel Šmerk, Ph.D.

Anotace

V originále

Cross-lingual embedding models (CMs) enable us to transfer lexical knowledge across languages. Therefore, they represent a useful approach for retrieving translation equivalents in lexicography. However, these models have been mainly oriented towards the natural language processing (NLP) field, lacking proper evaluation with error evaluation datasets that were compiled automatically. This causes discrepancies between models hindering the correct interpretation of the results. In this paper, we aim to address these issues and make these models more accessible for lexicography by evaluating them from a lexicographic point of view. We evaluate three benchmark CMs on three diverse language pairs: close, distant, and different script languages. Additionally, we propose key parameters that the evaluation dataset should include to meet lexicographic needs, have reproducible results, accurately reflect the performance, and set appropriate parameters during training. Our code and evaluation datasets are publicly available.

Návaznosti

EF19_073/0016943, projekt VaV
Název: Interní grantová agentura Masarykovy univerzity
MUNI/IGA/1285/2021, interní kód MU
Název: Finding translation equivalents without parallel texts
Investor: Masarykova univerzita, Finding translation equivalents without parallel texts