Bilingual Lexicon Induction From Comparable and Parallel Data:
A Comparative Analysis

DENISOVÁ, Michaela a Pavel RYCHLÝ. Bilingual Lexicon Induction From Comparable and Parallel Data: A Comparative Analysis. Online. In Nöth, E., Horák, A., Sojka, P. International Conference on Text, Speech, and Dialogue. Cham: Springer Nature Switzerland, 2024, s. 30-42, 12 s. ISBN 978-3-031-70563-2. Dostupné z: https://dx.doi.org/10.1007/978-3-031-70563-2_3.

Další formáty: BibTeX LaTeX RIS

Základní údaje
Originální název	Bilingual Lexicon Induction From Comparable and Parallel Data: A Comparative Analysis
Autoři	DENISOVÁ, Michaela a Pavel RYCHLÝ.
Vydání	Cham, International Conference on Text, Speech, and Dialogue, od s. 30-42, 12 s. 2024.
Nakladatel	Springer Nature Switzerland

Další údaje
Originální jazyk	angličtina
Typ výsledku	Stať ve sborníku
Obor	10201 Computer sciences, information science, bioinformatics
Stát vydavatele	Česká republika
Utajení	není předmětem státního či obchodního tajemství
Forma vydání	elektronická verze "online"
WWW	Preprint version
Organizační jednotka	Fakulta informatiky
ISBN	978-3-031-70563-2
Doi	http://dx.doi.org/10.1007/978-3-031-70563-2_3
Klíčová slova anglicky	bilingual lexicon induction; cross-lingual word embeddings; neural machine translation systems
Štítky	firank_B
Příznaky	Recenzováno
Změnil	Změnila: Mgr. Michaela Denisová, učo 449884. Změněno: 1. 9. 2024 07:21.

Anotace

Bilingual lexicon induction (BLI) from comparable data has become a common way of evaluating cross-lingual word embeddings (CWEs). These models have drawn much attention, mainly due to their availability for rare and low-resource language pairs. An alternative offers systems exploiting parallel data, such as popular neural machine translation systems (NMTSs), which are effective and yield state-of-the-art results. Despite the significant advancements in NMTSs, their effectiveness in the BLI task compared to the models using comparable data remains underexplored. In this paper, we provide a comparative study of the NMTS and CWE models evaluated on the BLI task and demonstrate the results across three diverse language pairs: distant (Estonian-English) and close (Estonian-Finnish) language pair and language pair with different scripts (Estonian-Russian). Our study reveals the differences, strengths, and limitations of both approaches. We show that while NMTSs achieve impressive results for languages with a great amount of training data available, CWEs emerge as a better option when faced less resources.

VytisknoutZobrazeno: 4. 10. 2024 22:56

Bilingual Lexicon Induction From Comparable and Parallel Data: A Comparative Analysis

Další aplikace