Does Size Matter? - Comparing Evaluation Dataset Size for the
Bilingual Lexicon Induction

DENISOVÁ, Michaela and Pavel RYCHLÝ. Does Size Matter? - Comparing Evaluation Dataset Size for the Bilingual Lexicon Induction. In Aleš Horák, Pavel Rychlý, Adam Rambousek. Proceedings of the Seventeenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2023. Karlova Studánka: Tribun EU, 2023, p. 47-56. ISBN 978-80-263-1793-7.

Other formats: BibTeX LaTeX RIS

Basic information
Original name	Does Size Matter? - Comparing Evaluation Dataset Size for the Bilingual Lexicon Induction
Authors	DENISOVÁ, Michaela (703 Slovakia, belonging to the institution) and Pavel RYCHLÝ (203 Czech Republic, belonging to the institution).
Edition	Karlova Studánka, Proceedings of the Seventeenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2023, p. 47-56, 10 pp. 2023.
Publisher	Tribun EU

Other information
Original language	English
Type of outcome	Proceedings paper
Field of Study	10201 Computer sciences, information science, bioinformatics
Country of publisher	Czech Republic
Confidentiality degree	is not subject to a state or trade secret
Publication form	printed version "print"
WWW	Domovská stránka workshopu Plný text
RIV identification code	RIV/00216224:14330/23:00133036
Organization unit	Faculty of Informatics
ISBN	978-80-263-1793-7
ISSN	2336-4289
Keywords in English	Cross-lingual word embeddings; Bilingual lexicon induction; Evaluation dataset’s size
Changed by	Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 8/4/2024 17:16.

Abstract

Cross-lingual word embeddings have been a popular approach for inducing bilingual lexicons. However, the evaluation of this task varies from paper to paper, and gold standard dictionaries used for the evaluation are frequently criticised for occurring mistakes. Although there have been efforts to unify the evaluation and gold standard dictionaries, we propose a new property that should be considered when compiling an evaluation dataset: size. In this paper, we evaluate three baseline models on three diverse language pairs (Estonian-Slovak, Czech-Slovak, English-Korean) and experiment with evaluation datasets of various sizes: 200, 500, 1.5K, and 3K source words. Moreover, we compare the results with manual error analysis. In this experiment, we show whether the size of an evaluation dataset impacts the results and how to select the ideal evaluation dataset size. We make our code and datasets publicly available.

PrintDisplayed: 5/10/2024 15:59

Does Size Matter? - Comparing Evaluation Dataset Size for the Bilingual Lexicon Induction

Other applications