Towards Useful Word Embeddings: Evaluation on Information Retrieval, Text Classification, and Language Modeling
Authors
NOVOTNÝ, Vít (203 Czech Republic, guarantor, belonging to the institution), Michal ŠTEFÁNIK (703 Slovakia, belonging to the institution), Dávid LUPTÁK (703 Slovakia, belonging to the institution) and Petr SOJKA (203 Czech Republic, belonging to the institution)
Edition
Brno, Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020, p. 37-46, 10 pp. 2020
Publisher
Tribun EU
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
10201 Computer sciences, information science, bioinformatics
Since the seminal work of Mikolov et al. (2013), word vectors of log-bilinear models have found their way into many NLP applications and were extended with the positional model.
Although the positional model improves accuracy on the intrinsic English word analogy task, prior work has neglected its evaluation on extrinsic end tasks, which correspond to real-world NLP applications.
In this paper, we describe our first steps in evaluating positional weighting on the information retrieval, text classification, and language modeling extrinsic end tasks.
Links
MUNI/A/1076/2019, interní kód MU
Name: Zapojení studentů Fakulty informatiky do mezinárodní vědecké komunity 20 (Acronym: SKOMU)
Investor: Masaryk University, Category A
MUNI/A/1411/2019, interní kód MU
Name: Aplikovaný výzkum: softwarové architektury kritických infrastruktur, bezpečnost počítačových systémů, zpracování přirozeného jazyka a jazykové inženýrství, vizualizaci velkých dat a rozšířená realita.
NOVOTNÝ, Vít, Michal ŠTEFÁNIK, Dávid LUPTÁK and Petr SOJKA. Towards Useful Word Embeddings: Evaluation on Information Retrieval, Text Classification, and Language Modeling. In Aleš Horák and Pavel Rychlý and Adam Rambousek. Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020. Brno: Tribun EU, 2020, p. 37-46. ISBN 978-80-263-1600-8.
@inproceedings{1699698, author = {Novotný, Vít and Štefánik, Michal and Lupták, Dávid and Sojka, Petr}, address = {Brno}, booktitle = {Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020}, editor = {Aleš Horák and Pavel Rychlý and Adam Rambousek}, keywords = {Evaluation; word vectors; word2vec; fastText; information retrieval; text classification; language modeling}, howpublished = {tištěná verze "print"}, language = {eng}, location = {Brno}, isbn = {978-80-263-1600-8}, pages = {37-46}, publisher = {Tribun EU}, title = {Towards Useful Word Embeddings: Evaluation on Information Retrieval, Text Classification, and Language Modeling}, url = {http://raslan2020.nlp-consulting.net/}, year = {2020} }
TY - JOUR ID - 1699698 AU - Novotný, Vít - Štefánik, Michal - Lupták, Dávid - Sojka, Petr PY - 2020 TI - Towards Useful Word Embeddings: Evaluation on Information Retrieval, Text Classification, and Language Modeling PB - Tribun EU CY - Brno SN - 9788026316008 KW - Evaluation KW - word vectors KW - word2vec KW - fastText KW - information retrieval KW - text classification KW - language modeling UR - http://raslan2020.nlp-consulting.net/ N2 -
Since the seminal work of Mikolov et al. (2013), word vectors of log-bilinear models have found their way into many NLP applications and were extended with the positional model.
Although the positional model improves accuracy on the intrinsic English word analogy task, prior work has neglected its evaluation on extrinsic end tasks, which correspond to real-world NLP applications.
In this paper, we describe our first steps in evaluating positional weighting on the information retrieval, text classification, and language modeling extrinsic end tasks.
ER -
NOVOTNÝ, Vít, Michal ŠTEFÁNIK, Dávid LUPTÁK and Petr SOJKA. Towards Useful Word Embeddings: Evaluation on Information Retrieval, Text Classification, and Language Modeling. In Aleš Horák and Pavel Rychlý and Adam Rambousek. \textit{Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020}. Brno: Tribun EU, 2020, p.~37-46. ISBN~978-80-263-1600-8.