DENISOVÁ, Michaela a Pavel RYCHLÝ. Bilingual Lexicon Induction From Comparable and Parallel Data: A Comparative Analysis. Online. In Nöth, E., Horák, A., Sojka, P. International Conference on Text, Speech, and Dialogue. Cham: Springer Nature Switzerland, 2024, s. 30-42, 12 s. ISBN 978-3-031-70563-2. Dostupné z: https://dx.doi.org/10.1007/978-3-031-70563-2_3.
Další formáty:   BibTeX LaTeX RIS
Základní údaje
Originální název Bilingual Lexicon Induction From Comparable and Parallel Data: A Comparative Analysis
Autoři DENISOVÁ, Michaela a Pavel RYCHLÝ.
Vydání Cham, International Conference on Text, Speech, and Dialogue, od s. 30-42, 12 s. 2024.
Nakladatel Springer Nature Switzerland
Další údaje
Originální jazyk angličtina
Typ výsledku Stať ve sborníku
Obor 10201 Computer sciences, information science, bioinformatics
Stát vydavatele Česká republika
Utajení není předmětem státního či obchodního tajemství
Forma vydání elektronická verze "online"
WWW Preprint version
Organizační jednotka Fakulta informatiky
ISBN 978-3-031-70563-2
Doi http://dx.doi.org/10.1007/978-3-031-70563-2_3
Klíčová slova anglicky bilingual lexicon induction; cross-lingual word embeddings; neural machine translation systems
Štítky firank_B
Příznaky Recenzováno
Změnil Změnila: Mgr. Michaela Denisová, učo 449884. Změněno: 1. 9. 2024 07:21.
Anotace
Bilingual lexicon induction (BLI) from comparable data has become a common way of evaluating cross-lingual word embeddings (CWEs). These models have drawn much attention, mainly due to their availability for rare and low-resource language pairs. An alternative offers systems exploiting parallel data, such as popular neural machine translation systems (NMTSs), which are effective and yield state-of-the-art results. Despite the significant advancements in NMTSs, their effectiveness in the BLI task compared to the models using comparable data remains underexplored. In this paper, we provide a comparative study of the NMTS and CWE models evaluated on the BLI task and demonstrate the results across three diverse language pairs: distant (Estonian-English) and close (Estonian-Finnish) language pair and language pair with different scripts (Estonian-Russian). Our study reveals the differences, strengths, and limitations of both approaches. We show that while NMTSs achieve impressive results for languages with a great amount of training data available, CWEs emerge as a better option when faced less resources.
VytisknoutZobrazeno: 4. 10. 2024 22:56