Off the Beaten Path: Let's Replace Term-Based Retrieval with
k-NN Search

BOYTSOV, Leonid, David NOVÁK, Yury MALKOV and Eric NYBERG. Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search. Online. In CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT. NEW YORK: ASSOC COMPUTING MACHINERY, 2016. p. 1099-1108. ISBN 978-1-4503-4073-1. Available from: https://dx.doi.org/10.1145/2983323.2983815. [citováno 2024-04-23]

Other formats: BibTeX LaTeX RIS

Basic information
Original name	Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search
Authors	BOYTSOV, Leonid (840 United States of America), David NOVÁK (203 Czech Republic, guarantor, belonging to the institution), Yury MALKOV (643 Russian Federation) and Eric NYBERG (840 United States of America)
Edition	NEW YORK, CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, p. 1099-1108, 10 pp. 2016.
Publisher	ASSOC COMPUTING MACHINERY

Other information
Original language	English
Type of outcome	Proceedings paper
Field of Study	10201 Computer sciences, information science, bioinformatics
Country of publisher	United States of America
Confidentiality degree	is not subject to a state or trade secret
Publication form	storage medium (CD, DVD, flash disk)
RIV identification code	RIV/00216224:14330/16:00088811
Organization unit	Faculty of Informatics
ISBN	978-1-4503-4073-1
Doi	http://dx.doi.org/10.1145/2983323.2983815
UT WoS	000390890800113
Keywords in English	k-NN search; IBM Model 1; non-metric spaces; LSH
Tags	core_A, DISA, firank_A
Tags	International impact, Reviewed
Changed by	Changed by: RNDr. David Novák, Ph.D., učo 4335. Changed: 7/4/2017 15:22.

Abstract

Retrieval pipelines commonly rely on a term-based search to obtain candidate records, which are subsequently re-ranked. Some candidates are missed by this approach, e.g., due to a vocabulary mismatch. We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle term associations. While an exact brute-force k-NN search using this similarity function is slow, we demonstrate that an approximate algorithm can be nearly two orders of magnitude faster at the expense of only a small loss in accuracy. A retrieval pipeline using an approximate k-NN search can be more effective and efficient than the term-based pipeline. This opens up new possibilities for designing effective retrieval pipelines. Our software (including data-generating code) and derivative data based on the Stack Overflow collection is available online.(1)

Links
GBP103/12/G084, research and development project	Name: Centrum pro multi-modální interpretaci dat velkého rozsahu
GBP103/12/G084, research and development project	Investor: Czech Science Foundation

PrintDisplayed: 23/4/2024 22:56

Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search

Other applications