Does Size Matter? - Comparing Evaluation Dataset Size for the
Bilingual Lexicon Induction

D 2023

Does Size Matter? - Comparing Evaluation Dataset Size for the Bilingual Lexicon Induction

DENISOVÁ, Michaela a Pavel RYCHLÝ

Základní údaje

Originální název

Does Size Matter? - Comparing Evaluation Dataset Size for the Bilingual Lexicon Induction

Autoři

DENISOVÁ, Michaela a Pavel RYCHLÝ

Vydání

Karlova Studánka, Proceedings of the Seventeenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2023, od s. 47-56, 10 s. 2023

Nakladatel

Tribun EU

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10201 Computer sciences, information science, bioinformatics

Stát vydavatele

Česká republika

Utajení

není předmětem státního či obchodního tajemství

Forma vydání

tištěná verze "print"

Odkazy

Domovská stránka workshopu, Plný text

Označené pro přenos do RIV

Ano

Kód RIV

RIV/00216224:14330/23:00133036

Organizační jednotka

Fakulta informatiky

ISBN

978-80-263-1793-7

ISSN

EID Scopus

2-s2.0-85187236779

Klíčová slova anglicky

Cross-lingual word embeddings; Bilingual lexicon induction; Evaluation dataset’s size

Příznaky

Recenzováno

Změněno: 30. 1. 2025 15:47, Mgr. Michaela Denisová

Anotace

V originále

Cross-lingual word embeddings have been a popular approach for inducing bilingual lexicons. However, the evaluation of this task varies from paper to paper, and gold standard dictionaries used for the evaluation are frequently criticised for occurring mistakes. Although there have been efforts to unify the evaluation and gold standard dictionaries, we propose a new property that should be considered when compiling an evaluation dataset: size. In this paper, we evaluate three baseline models on three diverse language pairs (Estonian-Slovak, Czech-Slovak, English-Korean) and experiment with evaluation datasets of various sizes: 200, 500, 1.5K, and 3K source words. Moreover, we compare the results with manual error analysis. In this experiment, we show whether the size of an evaluation dataset impacts the results and how to select the ideal evaluation dataset size. We make our code and datasets publicly available.

Citovat

DENISOVÁ, Michaela a Pavel RYCHLÝ. Does Size Matter? - Comparing Evaluation Dataset Size for the Bilingual Lexicon Induction. In Aleš Horák, Pavel Rychlý, Adam Rambousek. Proceedings of the Seventeenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2023. Karlova Studánka: Tribun EU, 2023, s. 47-56. ISBN 978-80-263-1793-7.

@inproceedings{2361461,
   author = {Denisová, Michaela and Rychlý, Pavel},
   address = {Karlova Studánka},
   booktitle = {Proceedings of the Seventeenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2023},
   editor = {Aleš Horák, Pavel Rychlý, Adam Rambousek},
   keywords = {Cross-lingual word embeddings; Bilingual lexicon induction; Evaluation dataset’s size},
   howpublished = {tištěná verze "print"},
   language = {eng},
   location = {Karlova Studánka},
   isbn = {978-80-263-1793-7},
   pages = {47-56},
   publisher = {Tribun EU},
   title = {Does Size Matter? - Comparing Evaluation Dataset Size for the Bilingual Lexicon Induction},
   url = {https://raslan2023.nlp-consulting.net/},
   year = {2023}
}

TY  - CONF
ID  - 2361461
AU  - Denisová, Michaela - Rychlý, Pavel
PY  - 2023
TI  - Does Size Matter? - Comparing Evaluation Dataset Size for the Bilingual Lexicon Induction
PB  - Tribun EU
CY  - Karlova Studánka
SN  - 9788026317937
KW  - Cross-lingual word embeddings
KW  - Bilingual lexicon induction
KW  - Evaluation dataset’s size
UR  - https://raslan2023.nlp-consulting.net/
N2  - Cross-lingual word embeddings have been a popular approach for inducing bilingual lexicons. However, the evaluation of this task varies from paper to paper, and gold standard dictionaries used for the evaluation are frequently criticised for occurring mistakes. Although there have been efforts to unify the evaluation and gold standard dictionaries, we propose a new property that should be considered when compiling an evaluation dataset: size. In this paper, we evaluate three baseline models on three diverse language pairs (Estonian-Slovak, Czech-Slovak, English-Korean) and experiment with evaluation datasets of various sizes: 200, 500, 1.5K, and 3K source words. Moreover, we compare the results with manual error analysis. In this experiment, we show whether the size of an evaluation dataset impacts the results and how to select the ideal evaluation dataset size. We make our code and datasets publicly available.
ER  -

DENISOVÁ, Michaela a Pavel RYCHLÝ. Does Size Matter? - Comparing Evaluation Dataset Size for the Bilingual Lexicon Induction. In Aleš Horák, Pavel Rychlý, Adam Rambousek. \textit{Proceedings of the Seventeenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2023}. Karlova Studánka: Tribun EU, 2023, s.~47-56. ISBN~978-80-263-1793-7.

Přehled o publikaci