SOJKA, Petr and Vít NOVOTNÝ. Semantically Coherent Vector Space Representations. 2019.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Semantically Coherent Vector Space Representations
Authors SOJKA, Petr (203 Czech Republic, guarantor, belonging to the institution) and Vít NOVOTNÝ (203 Czech Republic, belonging to the institution).
Edition 2019.
Other information
Original language English
Type of outcome Presentations at conferences
Field of Study 10200 1.2 Computer and information sciences
Country of publisher Czech Republic
Confidentiality degree is not subject to a state or trade secret
WWW Scientific poster PDF
RIV identification code RIV/00216224:14330/19:00109517
Organization unit Faculty of Informatics
Keywords (in Czech) umělá inteligence; strojové učení; výpočetní lingvistika; získávání znalostí; učení reprezentace; slovní embeddingy; formal concept analysis; word2vec; word2bits
Keywords in English artificial intelligence; machine learning; computational linguistics; information retrieval; representation learning; word embeddings; formal concept analysis; transfer learning; word2vec; word2bits
Tags machine learning
Tags International impact
Changed by Changed by: RNDr. Vít Starý Novotný, Ph.D., učo 409729. Changed: 1/11/2021 09:35.
Abstract

Our work is a scientific poster that was presented at the ML Prague 2019 conference during February 22–24, 2019.

Content is king (Gates, 1996). Decomposition of word semantics matters (Mikolov, 2013). Decomposition of a sentence, paragraph, and document semantics into semantically coherent vector space representations matters, too. Interpretability of these learned vector spaces is the holy grail of natural language processing today, as it would allow accurate representation of thoughts and plugging-in inference into the game.

We will show recent results of our attempts towards this goal by showing how decomposition of document semantics could improve the query answering, performance, and “horizontal transfer learning” based on word2bits could be achieved.

Word representation in the form of binary features allows to use word lattice representation for feature inference by the well studied formal concept analysis theory, and for precise semantic similarity metric based on discriminative features. Also, the incremental learning of word features allows to interpret and infer on them, targeting the holy grail.

Links
MUNI/A/1145/2018, interní kód MUName: Aplikovaný výzkum na FI: softwarové architektury kritických infrastruktur, bezpečnost počítačových systémů, techniky pro zpracování a vizualizaci velkých dat a rozšířená realita.
Investor: Masaryk University, Critical Infrastructure Software Architectures, Computer Systems Security, Data Processing and Visualization Techniques, and Augmented Reality, Category A
PrintDisplayed: 6/5/2024 20:07