Off the Beaten Path: Let's Replace Term-Based Retrieval with
k-NN Search

D 2016

Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search

BOYTSOV, Leonid, David NOVÁK, Yury MALKOV and Eric NYBERG

Basic information

Original name

Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search

Authors

BOYTSOV, Leonid (840 United States of America), David NOVÁK (203 Czech Republic, guarantor, belonging to the institution), Yury MALKOV (643 Russian Federation) and Eric NYBERG (840 United States of America)

Edition

NEW YORK, CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, p. 1099-1108, 10 pp. 2016

Publisher

ASSOC COMPUTING MACHINERY

Other information

Language

English

Type of outcome

Proceedings paper

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

United States of America

Confidentiality degree

is not subject to a state or trade secret

Publication form

storage medium (CD, DVD, flash disk)

RIV identification code

RIV/00216224:14330/16:00088811

Organization unit

Faculty of Informatics

ISBN

978-1-4503-4073-1

DOI

http://dx.doi.org/10.1145/2983323.2983815

UT WoS

000390890800113

Keywords in English

k-NN search; IBM Model 1; non-metric spaces; LSH

Abstract

V originále

Retrieval pipelines commonly rely on a term-based search to obtain candidate records, which are subsequently re-ranked. Some candidates are missed by this approach, e.g., due to a vocabulary mismatch. We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle term associations. While an exact brute-force k-NN search using this similarity function is slow, we demonstrate that an approximate algorithm can be nearly two orders of magnitude faster at the expense of only a small loss in accuracy. A retrieval pipeline using an approximate k-NN search can be more effective and efficient than the term-based pipeline. This opens up new possibilities for designing effective retrieval pipelines. Our software (including data-generating code) and derivative data based on the Stack Overflow collection is available online.(1)

Links

GBP103/12/G084, research and development project

Name: Centrum pro multi-modální interpretaci dat velkého rozsahu

Investor: Czech Science Foundation

Citovat

BOYTSOV, Leonid, David NOVÁK, Yury MALKOV and Eric NYBERG. Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search. In CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT. NEW YORK: ASSOC COMPUTING MACHINERY, 2016, p. 1099-1108. ISBN 978-1-4503-4073-1. Available from: https://dx.doi.org/10.1145/2983323.2983815.

@inproceedings{1377704,
   author = {Boytsov, Leonid and Novák, David and Malkov, Yury and Nyberg, Eric},
   address = {NEW YORK},
   booktitle = {CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT},
   doi = {http://dx.doi.org/10.1145/2983323.2983815},
   keywords = {k-NN search; IBM Model 1; non-metric spaces; LSH},
   howpublished = {paměťový nosič},
   language = {eng},
   location = {NEW YORK},
   isbn = {978-1-4503-4073-1},
   pages = {1099-1108},
   publisher = {ASSOC COMPUTING MACHINERY},
   title = {Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search},
   year = {2016}
}

TY  - CONF
ID  - 1377704
AU  - Boytsov, Leonid - Novák, David - Malkov, Yury - Nyberg, Eric
PY  - 2016
TI  - Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search
PB  - ASSOC COMPUTING MACHINERY
CY  - NEW YORK
SN  - 9781450340731
KW  - k-NN search
KW  - IBM Model 1
KW  - non-metric spaces
KW  - LSH
N2  - Retrieval pipelines commonly rely on a term-based search to obtain candidate records, which are subsequently re-ranked. Some candidates are missed by this approach, e.g., due to a vocabulary mismatch. We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle term associations. While an exact brute-force k-NN search using this similarity function is slow, we demonstrate that an approximate algorithm can be nearly two orders of magnitude faster at the expense of only a small loss in accuracy. A retrieval pipeline using an approximate k-NN search can be more effective and efficient than the term-based pipeline. This opens up new possibilities for designing effective retrieval pipelines. Our software (including data-generating code) and derivative data based on the Stack Overflow collection is available online.(1)
ER  -

BOYTSOV, Leonid, David NOVÁK, Yury MALKOV and Eric NYBERG. Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search. In \textit{CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT}. NEW YORK: ASSOC COMPUTING MACHINERY, 2016, p.~1099-1108. ISBN~978-1-4503-4073-1. Available from: https://dx.doi.org/10.1145/2983323.2983815.

Detailed Information on Publication Record