2021
EDS-MEMBED: Multi-sense embeddings based on enhanced distributional semantic structures via a graph walk over word senses
AYETIRAN, Eniafe Festus, Petr SOJKA a Vít NOVOTNÝZákladní údaje
Originální název
EDS-MEMBED: Multi-sense embeddings based on enhanced distributional semantic structures via a graph walk over word senses
Autoři
AYETIRAN, Eniafe Festus (566 Nigérie, garant, domácí), Petr SOJKA (203 Česká republika, domácí) a Vít NOVOTNÝ (203 Česká republika, domácí)
Vydání
Knowledge-Based Systems, Elsevier, 2021, 0950-7051
Další údaje
Jazyk
angličtina
Typ výsledku
Článek v odborném periodiku
Obor
10201 Computer sciences, information science, bioinformatics
Stát vydavatele
Nizozemské království
Utajení
není předmětem státního či obchodního tajemství
Impakt faktor
Impact factor: 8.139
Kód RIV
RIV/00216224:14330/21:00120721
Organizační jednotka
Fakulta informatiky
UT WoS
000634868500007
Klíčová slova anglicky
Multi-sense embeddings; Graph walk; Language generation; Distributional semantics; Distributional structures; Word sense disambiguation; Knowledge-based systems; Word similarity; Semantic applications
Příznaky
Mezinárodní význam, Recenzováno
Změněno: 23. 5. 2022 14:19, RNDr. Pavel Šmerk, Ph.D.
Anotace
V originále
Several language applications often require word semantics as a core part of their processing pipeline either as precise meaning inference or semantic similarity. Multi-sense embeddings (M-SE) can be exploited for this important requirement. M-SE seeks to represent each word by their distinct senses in order to resolve the conflation of meanings of words as used in different contexts. Previous works usually approach this task by training a model on a large corpus and often ignore the effect and usefulness of the semantic relations offered by lexical resources. However, even with large training data, coverage of all possible word senses is still an issue. In addition, a considerable percentage of contextual semantic knowledge is never learned because a huge amount of possible distributional semantic structures are never explored. In this paper, we leverage the rich semantic structures in WordNet using a graph-theoretic walk technique over word senses to enhance the quality of multi-sense embeddings. This algorithm composes enriched texts from the original texts. Furthermore, we derive new distributional semantic similarity measures for M-SE from prior ones. We adapt these measures to the word sense disambiguation (WSD) aspect of our experiment. We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks and show that our method for enhancing distributional semantic structures improves embeddings quality on the baselines. Despite the small training data, it achieves state-of-the-art performance on some of the datasets.
Návaznosti
MUNI/A/1411/2019, interní kód MU |
| ||
MUNI/A/1549/2020, interní kód MU |
|