Detailed Information on Publication Record
2023
Evaluation of the Cross-lingual Embedding Models from the Lexicographic Perspective
DENISOVÁ, Michaela and Pavel RYCHLÝBasic information
Original name
Evaluation of the Cross-lingual Embedding Models from the Lexicographic Perspective
Authors
DENISOVÁ, Michaela (703 Slovakia, belonging to the institution) and Pavel RYCHLÝ (203 Czech Republic, belonging to the institution)
Edition
Brno, Electronic lexicography in the 21st century (eLex 2023): Invisible Lexicography. Proceedings of the eLex 2023 conference, p. 1-18, 18 pp. 2023
Publisher
Lexical Computing CZ s.r.o.
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
10201 Computer sciences, information science, bioinformatics
Country of publisher
Czech Republic
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
printed version "print"
References:
RIV identification code
RIV/00216224:14330/23:00131141
Organization unit
Faculty of Informatics
ISSN
Keywords in English
cross-lingual embedding models; bilingual lexicon induction task; retrieving translation equivalents; evaluation
Změněno: 8/4/2024 14:11, RNDr. Pavel Šmerk, Ph.D.
Abstract
V originále
Cross-lingual embedding models (CMs) enable us to transfer lexical knowledge across languages. Therefore, they represent a useful approach for retrieving translation equivalents in lexicography. However, these models have been mainly oriented towards the natural language processing (NLP) field, lacking proper evaluation with error evaluation datasets that were compiled automatically. This causes discrepancies between models hindering the correct interpretation of the results. In this paper, we aim to address these issues and make these models more accessible for lexicography by evaluating them from a lexicographic point of view. We evaluate three benchmark CMs on three diverse language pairs: close, distant, and different script languages. Additionally, we propose key parameters that the evaluation dataset should include to meet lexicographic needs, have reproducible results, accurately reflect the performance, and set appropriate parameters during training. Our code and evaluation datasets are publicly available.
Links
EF19_073/0016943, research and development project |
| ||
MUNI/IGA/1285/2021, interní kód MU |
|