D 2022

Diverse Semantics Representation is King

GELETKA, Martin, Vojtěch KALIVODA, Michal ŠTEFÁNIK, Marek TOMA, Petr SOJKA et. al.

Basic information

Original name

Diverse Semantics Representation is King

Authors

GELETKA, Martin (703 Slovakia, guarantor, belonging to the institution), Vojtěch KALIVODA (203 Czech Republic, belonging to the institution), Michal ŠTEFÁNIK (703 Slovakia, belonging to the institution), Marek TOMA (703 Slovakia, belonging to the institution) and Petr SOJKA (203 Czech Republic, belonging to the institution)

Edition

Bologna, Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum, p. 28-39, 12 pp. 2022

Publisher

CEUR.org

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Italy

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

electronic version available online

References:

RIV identification code

RIV/00216224:14330/22:00126314

Organization unit

Faculty of Informatics

ISSN

Keywords (in Czech)

vyhledávání informací; odpovídání otázek; reprezentace matematiky; vyhledávání informací s včetně matematických formulí; reprezentace významu slov; slučování vyhledaných výsledků; hlasování informačních systémů; změna pořadí výsledků hledání; fúze dat; diverzita systémů; transformery

Keywords in English

information retrieval; question answering; math representations; math-aware information retrieval; word embeddings; ensembling; voting; reranking; data fusion; diversity; transformers

Tags

International impact, Reviewed
Změněno: 28/3/2023 11:45, RNDr. Pavel Šmerk, Ph.D.

Abstract

V originále

We report on the systems that the Math Information Retrieval group at Masaryk University (MIRMU) and the team of Faculty of Informatics students (MSM) prepared for task 1 (find answers) of the ARQMath lab at the CLEF conference. To study the effects of different system settings and hyperparameters, we have prototyped several diverse math-aware information retrieval (MIR) systems: both “old” inverted index-based ones and new neural ones. By ensembling the results of the “weak” individual systems into committees, we report on entailments, benefits, and drawbacks of system ensembling. We evaluated the proposed individual systems and ensembles, considering their diversity, hyperparameters, and representations used, and classified their approaches. Our prototypes have helped to understand the challenging problems of question-answering in the stem domain: the key lies in the proper representation of document semantics. Our reproducible evaluation Python library PV211-utils allows to reproduce and further advance MIR re-search.

Links

MUNI/A/1195/2021, interní kód MU
Name: Aplikovaný výzkum v oblastech vyhledávání, analýz a vizualizací rozsáhlých dat, zpracování přirozeného jazyka a aplikované umělé inteligence
Investor: Masaryk University