2024
Bilingual Lexicon Induction From Comparable and Parallel Data: A Comparative Analysis
DENISOVÁ, Michaela a Pavel RYCHLÝZákladní údaje
Originální název
Bilingual Lexicon Induction From Comparable and Parallel Data: A Comparative Analysis
Autoři
DENISOVÁ, Michaela ORCID a Pavel RYCHLÝ
Vydání
Cham, International Conference on Text, Speech, and Dialogue, od s. 30-42, 13 s. 2024
Nakladatel
Springer Nature Switzerland
Další údaje
Jazyk
angličtina
Typ výsledku
Stať ve sborníku
Obor
10201 Computer sciences, information science, bioinformatics
Stát vydavatele
Česká republika
Utajení
není předmětem státního či obchodního tajemství
Forma vydání
elektronická verze "online"
Odkazy
Impakt faktor
Impact factor: 0.402 v roce 2005
Označené pro přenos do RIV
Ano
Kód RIV
RIV/00216224:14330/24:00136956
Organizační jednotka
Fakulta informatiky
ISBN
978-3-031-70562-5
ISSN
UT WoS
EID Scopus
Klíčová slova anglicky
bilingual lexicon induction; cross-lingual word embeddings; neural machine translation systems
Štítky
Příznaky
Mezinárodní význam, Recenzováno
Změněno: 4. 4. 2025 12:11, RNDr. Pavel Šmerk, Ph.D.
Anotace
V originále
Bilingual lexicon induction (BLI) from comparable data has become a common way of evaluating cross-lingual word embeddings (CWEs). These models have drawn much attention, mainly due to their availability for rare and low-resource language pairs. An alternative offers systems exploiting parallel data, such as popular neural machine translation systems (NMTSs), which are effective and yield state-of-the-art results. Despite the significant advancements in NMTSs, their effectiveness in the BLI task compared to the models using comparable data remains underexplored. In this paper, we provide a comparative study of the NMTS and CWE models evaluated on the BLI task and demonstrate the results across three diverse language pairs: distant (Estonian-English) and close (Estonian-Finnish) language pair and language pair with different scripts (Estonian-Russian). Our study reveals the differences, strengths, and limitations of both approaches. We show that while NMTSs achieve impressive results for languages with a great amount of training data available, CWEs emerge as a better option when faced less resources.
Návaznosti
| MUNI/A/1590/2023, interní kód MU |
|