D 2020

Towards Useful Word Embeddings: Evaluation on Information Retrieval, Text Classification, and Language Modeling

NOVOTNÝ, Vít, Michal ŠTEFÁNIK, Dávid LUPTÁK and Petr SOJKA

Basic information

Original name

Towards Useful Word Embeddings: Evaluation on Information Retrieval, Text Classification, and Language Modeling

Authors

NOVOTNÝ, Vít (203 Czech Republic, guarantor, belonging to the institution), Michal ŠTEFÁNIK (703 Slovakia, belonging to the institution), Dávid LUPTÁK (703 Slovakia, belonging to the institution) and Petr SOJKA (203 Czech Republic, belonging to the institution)

Edition

Brno, Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020, p. 37-46, 10 pp. 2020

Publisher

Tribun EU

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

RIV identification code

RIV/00216224:14330/20:00117105

Organization unit

Faculty of Informatics

ISBN

978-80-263-1600-8

ISSN

UT WoS

000655471300004

Keywords (in Czech)

evaluace; slovní vektory; word2vec; fastText; vyhledávání informací; klasifikace textů; jazykové modelování

Keywords in English

Evaluation; word vectors; word2vec; fastText; information retrieval; text classification; language modeling

Tags

International impact
Změněno: 16/5/2022 15:08, Mgr. Michal Petr

Abstract

V originále

Since the seminal work of Mikolov et al. (2013), word vectors of log-bilinear models have found their way into many NLP applications and were extended with the positional model.

Although the positional model improves accuracy on the intrinsic English word analogy task, prior work has neglected its evaluation on extrinsic end tasks, which correspond to real-world NLP applications.

In this paper, we describe our first steps in evaluating positional weighting on the information retrieval, text classification, and language modeling extrinsic end tasks.


Links

MUNI/A/1076/2019, interní kód MU
Name: Zapojení studentů Fakulty informatiky do mezinárodní vědecké komunity 20 (Acronym: SKOMU)
Investor: Masaryk University, Category A
MUNI/A/1411/2019, interní kód MU
Name: Aplikovaný výzkum: softwarové architektury kritických infrastruktur, bezpečnost počítačových systémů, zpracování přirozeného jazyka a jazykové inženýrství, vizualizaci velkých dat a rozšířená realita.
Investor: Masaryk University, Category A