NOVOTNÝ, Vít, Michal ŠTEFÁNIK, Dávid LUPTÁK and Petr SOJKA. Towards Useful Word Embeddings: Evaluation on Information Retrieval, Text Classification, and Language Modeling. In Aleš Horák and Pavel Rychlý and Adam Rambousek. Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020. Brno: Tribun EU, 2020. p. 37-46. ISBN 978-80-263-1600-8.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Towards Useful Word Embeddings: Evaluation on Information Retrieval, Text Classification, and Language Modeling
Authors NOVOTNÝ, Vít (203 Czech Republic, guarantor, belonging to the institution), Michal ŠTEFÁNIK (703 Slovakia, belonging to the institution), Dávid LUPTÁK (703 Slovakia, belonging to the institution) and Petr SOJKA (203 Czech Republic, belonging to the institution).
Edition Brno, Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020, p. 37-46, 10 pp. 2020.
Publisher Tribun EU
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Czech Republic
Confidentiality degree is not subject to a state or trade secret
Publication form printed version "print"
WWW workshop homepage PDF (fulltext)
RIV identification code RIV/00216224:14330/20:00117105
Organization unit Faculty of Informatics
ISBN 978-80-263-1600-8
ISSN 2336-4289
UT WoS 000655471300004
Keywords (in Czech) evaluace; slovní vektory; word2vec; fastText; vyhledávání informací; klasifikace textů; jazykové modelování
Keywords in English Evaluation; word vectors; word2vec; fastText; information retrieval; text classification; language modeling
Tags information retrieval, language modeling, machine learning, SCM, soft cosine measure, text classification, word embeddings
Tags International impact
Changed by Changed by: Mgr. Michal Petr, učo 65024. Changed: 16/5/2022 15:08.
Abstract

Since the seminal work of Mikolov et al. (2013), word vectors of log-bilinear models have found their way into many NLP applications and were extended with the positional model.

Although the positional model improves accuracy on the intrinsic English word analogy task, prior work has neglected its evaluation on extrinsic end tasks, which correspond to real-world NLP applications.

In this paper, we describe our first steps in evaluating positional weighting on the information retrieval, text classification, and language modeling extrinsic end tasks.

Links
MUNI/A/1076/2019, interní kód MUName: Zapojení studentů Fakulty informatiky do mezinárodní vědecké komunity 20 (Acronym: SKOMU)
Investor: Masaryk University, Category A
MUNI/A/1411/2019, interní kód MUName: Aplikovaný výzkum: softwarové architektury kritických infrastruktur, bezpečnost počítačových systémů, zpracování přirozeného jazyka a jazykové inženýrství, vizualizaci velkých dat a rozšířená realita.
Investor: Masaryk University, Category A
PrintDisplayed: 7/10/2022 07:43