Detailed Information on Publication Record
2017
Flexible Similarity Search of Semantic Vectors Using Fulltext Search Engines
RŮŽIČKA, Michal, Vít NOVOTNÝ, Petr SOJKA, Jan POMIKÁLEK, Radim ŘEHŮŘEK et. al.Basic information
Original name
Flexible Similarity Search of Semantic Vectors Using Fulltext Search Engines
Authors
RŮŽIČKA, Michal (203 Czech Republic, belonging to the institution), Vít NOVOTNÝ (203 Czech Republic, belonging to the institution), Petr SOJKA (203 Czech Republic, guarantor, belonging to the institution), Jan POMIKÁLEK (203 Czech Republic) and Radim ŘEHŮŘEK (203 Czech Republic)
Edition
Vienna, Austria, CEUR Workshop Proceedings, Vol. 1923, p. 1-12, 12 pp. 2017
Publisher
Neuveden
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
10201 Computer sciences, information science, bioinformatics
Country of publisher
Austria
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
electronic version available online
RIV identification code
RIV/00216224:14330/17:00094375
Organization unit
Faculty of Informatics
ISSN
Keywords in English
vector space modelling; semantic vectors encodings; inverted-index; systems performance; document representations; Latent Semantic Analysis; doc2vec; GloVe; Elasticsearch; evaluation; performance optimization
Tags
International impact, Reviewed
Změněno: 3/1/2023 15:15, RNDr. Vít Starý Novotný, Ph.D.
Abstract
V originále
Vector representations and vector space modeling (VSM) play a central role in modern machine learning. In our recent research we proposed a novel approach to ‘vector similarity searching’ over dense semantic vector representations. This approach can be deployed on top of traditional inverted-index-based fulltext engines, taking advantage of their robustness, stability, scalability and ubiquity. In this paper we validate our method using varied datasets ranging from text representations and embeddings (LSA, doc2vec, GloVe) to SIFT descriptors of image data. We show how our approach handles the indexing and querying in these domains, building a fast and scalable vector database with a tunable trade-off between vector search performance and quality, backed by a standard fulltext engine such as Elasticsearch.
Links
MUNI/A/0997/2016, interní kód MU |
| ||
TD03000295, research and development project |
|