D 2019

Word Sense Induction Using Word Sketches

HERMAN, Ondřej, Vojtěch KOVÁŘ, Miloš JAKUBÍČEK and Pavel RYCHLÝ

Basic information

Original name

Word Sense Induction Using Word Sketches

Authors

HERMAN, Ondřej (203 Czech Republic, guarantor, belonging to the institution), Vojtěch KOVÁŘ (203 Czech Republic), Miloš JAKUBÍČEK (203 Czech Republic) and Pavel RYCHLÝ (203 Czech Republic)

Edition

Cham, Proceedings of the 7th International Conference on Statistical Language and Speech Processing, p. 83-91, 9 pp. 2019

Publisher

Springer

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10200 1.2 Computer and information sciences

Country of publisher

Switzerland

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

Impact factor

Impact factor: 0.402 in 2005

RIV identification code

RIV/00216224:14330/19:00107596

Organization unit

Faculty of Informatics

ISBN

978-3-030-31371-5

ISSN

Keywords in English

Word sense induction;Word sketch;Collocations;Word embeddings

Tags

International impact, Reviewed
Změněno: 22/10/2023 01:49, RNDr. Miloš Jakubíček, Ph.D.

Abstract

V originále

We present three methods for word sense induction based on Word Sketches. The methods are being developed a part of an semiautomatic dictionary creation system, providing annotators with the summarized semantic behavior of a word. Two of the methods are based on the assumption of a word having a single sense per collocation. We cluster the Word Sketch based collocations by their co-occurrence behavior in the first method. The second method clusters the collocations using word embedding model. The last method is based on clustering of Word Sketch thesauri. We evaluate the methods and demonstrate their behavior on representative words.

Links

EF16_013/0001781, research and development project
Name: LINDAT/CLARIN - Výzkumná infrastruktura pro jazykové technologie - rozšíření repozitáře a výpočetní kapacity
GA18-23891S, research and development project
Name: Hyperintensionální usuzování nad texty přirozeného jazyka
Investor: Czech Science Foundation
LM2015071, research and development project
Name: Jazyková výzkumná infrastruktura v České republice (Acronym: LINDAT-Clarin)
Investor: Ministry of Education, Youth and Sports of the CR
MUNI/A/1018/2018, interní kód MU
Name: Rozsáhlé výpočetní systémy: modely, aplikace a verifikace VIII.
Investor: Masaryk University, Category A