Evaluation of the Cross-lingual Embedding Models from the
Lexicographic Perspective

D 2023

Evaluation of the Cross-lingual Embedding Models from the Lexicographic Perspective

DENISOVÁ, Michaela and Pavel RYCHLÝ

Basic information

Original name

Evaluation of the Cross-lingual Embedding Models from the Lexicographic Perspective

Authors

DENISOVÁ, Michaela (703 Slovakia, belonging to the institution) and Pavel RYCHLÝ (203 Czech Republic, belonging to the institution)

Edition

Brno, Electronic lexicography in the 21st century (eLex 2023): Invisible Lexicography. Proceedings of the eLex 2023 conference, p. 1-18, 18 pp. 2023

Publisher

Lexical Computing CZ s.r.o.

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

References:

Plný text

RIV identification code

RIV/00216224:14330/23:00131141

Organization unit

Faculty of Informatics

ISSN

Keywords in English

cross-lingual embedding models; bilingual lexicon induction task; retrieving translation equivalents; evaluation

Změněno: 8/4/2024 14:11, RNDr. Pavel Šmerk, Ph.D.

Abstract

V originále

Cross-lingual embedding models (CMs) enable us to transfer lexical knowledge across languages. Therefore, they represent a useful approach for retrieving translation equivalents in lexicography. However, these models have been mainly oriented towards the natural language processing (NLP) field, lacking proper evaluation with error evaluation datasets that were compiled automatically. This causes discrepancies between models hindering the correct interpretation of the results. In this paper, we aim to address these issues and make these models more accessible for lexicography by evaluating them from a lexicographic point of view. We evaluate three benchmark CMs on three diverse language pairs: close, distant, and different script languages. Additionally, we propose key parameters that the evaluation dataset should include to meet lexicographic needs, have reproducible results, accurately reflect the performance, and set appropriate parameters during training. Our code and evaluation datasets are publicly available.

Links

EF19_073/0016943, research and development project

Name: Interní grantová agentura Masarykovy univerzity

MUNI/IGA/1285/2021, interní kód MU

Name: Finding translation equivalents without parallel texts

Investor: Masaryk University

Citovat

DENISOVÁ, Michaela and Pavel RYCHLÝ. Evaluation of the Cross-lingual Embedding Models from the Lexicographic Perspective. In Electronic lexicography in the 21st century (eLex 2023): Invisible Lexicography. Proceedings of the eLex 2023 conference. Brno: Lexical Computing CZ s.r.o., 2023, p. 1-18. ISSN 2533-5626.

@inproceedings{2295462,
   author = {Denisová, Michaela and Rychlý, Pavel},
   address = {Brno},
   booktitle = {Electronic lexicography in the 21st century (eLex 2023): Invisible Lexicography. Proceedings of the eLex 2023 conference},
   keywords = {cross-lingual embedding models; bilingual lexicon induction task; retrieving translation equivalents; evaluation},
   howpublished = {tištěná verze "print"},
   language = {eng},
   location = {Brno},
   pages = {1-18},
   publisher = {Lexical Computing CZ s.r.o.},
   title = {Evaluation of the Cross-lingual Embedding Models from the Lexicographic Perspective},
   url = {https://elex.link/elex2023/wp-content/uploads/elex2023_proceedings.pdf#page=9},
   year = {2023}
}

TY  - JOUR
ID  - 2295462
AU  - Denisová, Michaela - Rychlý, Pavel
PY  - 2023
TI  - Evaluation of the Cross-lingual Embedding Models from the Lexicographic Perspective
PB  - Lexical Computing CZ s.r.o.
CY  - Brno
KW  - cross-lingual embedding models
KW  - bilingual lexicon induction task
KW  - retrieving translation equivalents
KW  - evaluation
UR  - https://elex.link/elex2023/wp-content/uploads/elex2023_proceedings.pdf#page=9
N2  - Cross-lingual embedding models (CMs) enable us to transfer lexical knowledge across languages. Therefore, they represent a useful approach for retrieving translation equivalents in lexicography. However, these models have been mainly oriented towards the natural language processing (NLP) field, lacking proper evaluation with error evaluation datasets that were compiled automatically. This causes discrepancies between models hindering the correct interpretation of the results. In this paper, we aim to address these issues and make these models more accessible for lexicography by evaluating them from a lexicographic point of view. We evaluate three benchmark CMs on three diverse language pairs: close, distant, and different script languages. Additionally, we propose key parameters that the evaluation dataset should include to meet lexicographic needs, have reproducible results, accurately reflect the performance, and set appropriate parameters during training. Our code and evaluation datasets are publicly available.
ER  -

DENISOVÁ, Michaela and Pavel RYCHLÝ. Evaluation of the Cross-lingual Embedding Models from the Lexicographic Perspective. In \textit{Electronic lexicography in the 21st century (eLex 2023): Invisible Lexicography. Proceedings of the eLex 2023 conference}. Brno: Lexical Computing CZ s.r.o., 2023, p.~1-18. ISSN~2533-5626.

Detailed Information on Publication Record