HERMAN, Ondřej, Vojtěch KOVÁŘ, Miloš JAKUBÍČEK and Pavel RYCHLÝ. Word Sense Induction Using Word Sketches. In Martín-Vide C., Purver M., Pollak S. Proceedings of the 7th International Conference on Statistical Language and Speech Processing. Cham: Springer, 2019, p. 83-91. ISBN 978-3-030-31371-5. Available from: https://dx.doi.org/10.1007/978-3-030-31372-2_7.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Word Sense Induction Using Word Sketches
Authors HERMAN, Ondřej (203 Czech Republic, guarantor, belonging to the institution), Vojtěch KOVÁŘ (203 Czech Republic), Miloš JAKUBÍČEK (203 Czech Republic) and Pavel RYCHLÝ (203 Czech Republic).
Edition Cham, Proceedings of the 7th International Conference on Statistical Language and Speech Processing, p. 83-91, 9 pp. 2019.
Publisher Springer
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10200 1.2 Computer and information sciences
Country of publisher Switzerland
Confidentiality degree is not subject to a state or trade secret
Publication form printed version "print"
Impact factor Impact factor: 0.402 in 2005
RIV identification code RIV/00216224:14330/19:00107596
Organization unit Faculty of Informatics
ISBN 978-3-030-31371-5
ISSN 0302-9743
Doi http://dx.doi.org/10.1007/978-3-030-31372-2_7
Keywords in English Word sense induction;Word sketch;Collocations;Word embeddings
Tags International impact, Reviewed
Changed by Changed by: RNDr. Miloš Jakubíček, Ph.D., učo 172962. Changed: 22/10/2023 01:49.
Abstract
We present three methods for word sense induction based on Word Sketches. The methods are being developed a part of an semiautomatic dictionary creation system, providing annotators with the summarized semantic behavior of a word. Two of the methods are based on the assumption of a word having a single sense per collocation. We cluster the Word Sketch based collocations by their co-occurrence behavior in the first method. The second method clusters the collocations using word embedding model. The last method is based on clustering of Word Sketch thesauri. We evaluate the methods and demonstrate their behavior on representative words.
Links
EF16_013/0001781, research and development projectName: LINDAT/CLARIN - Výzkumná infrastruktura pro jazykové technologie - rozšíření repozitáře a výpočetní kapacity
GA18-23891S, research and development projectName: Hyperintensionální usuzování nad texty přirozeného jazyka
Investor: Czech Science Foundation
LM2015071, research and development projectName: Jazyková výzkumná infrastruktura v České republice (Acronym: LINDAT-Clarin)
Investor: Ministry of Education, Youth and Sports of the CR
MUNI/A/1018/2018, interní kód MUName: Rozsáhlé výpočetní systémy: modely, aplikace a verifikace VIII.
Investor: Masaryk University, Category A
PrintDisplayed: 5/6/2024 01:30