D 2018

Weighting of Passages in Question Answering

NOVOTNÝ, Vít and Petr SOJKA

Basic information

Original name

Weighting of Passages in Question Answering

Authors

NOVOTNÝ, Vít (203 Czech Republic, belonging to the institution) and Petr SOJKA (203 Czech Republic, guarantor, belonging to the institution)

Edition

Brno, Proceedings of the Twelfth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2018, p. 31-40, 10 pp. 2018

Publisher

Tribun EU

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

RIV identification code

RIV/00216224:14330/18:00101863

Organization unit

Faculty of Informatics

ISBN

978-80-263-1517-9

ISSN

UT WoS

000612420300005

Keywords (in Czech)

vyhledávání textů; odpovídání dotazů; Godwinův zákon; SemEval; vážení dokumentových pasáží

Keywords in English

passage retrieval; question answering; Godwin’s law; SemEval; weighting of document passages

Tags

International impact, Reviewed
Změněno: 3/1/2023 13:53, RNDr. Vít Starý Novotný, Ph.D.

Abstract

V originále

Modern text retrieval systems employ text segmentation during the indexing of documents. We show that, rather than returning the passages to the user, significant improvements are achieved on the semantic text similarity task on question answering (QA) datasets by combining all passages from a document into a single result with an aggregate similarity score. Following an analysis of the SemEval-2016 and 2017 task 3 datasets, we develop a weighted averaging operator that achieves state-of-the-art results on subtask B and can be implemented into existing search engines. Segmentation in information retrieval matters. Our results show that paying attention to important passages by using a task-specific weighting method leads to the best results on these question answering domain retrieval tasks.

Links

MUNI/A/1213/2017, interní kód MU
Name: Aplikovaný výzkum na FI: bezpečnost počítačových systémů, SW architektury kritických infrastruktur, zpracování velkých dat, vizualizace dat a virtuální realita
Investor: Masaryk University, Applied research at FI: computer systems security, SW architecture of critical infrastructure, big data processing, data visualization and virtual reality, Category A
TD03000295, research and development project
Name: Inteligentní software pro sémantické hledání dokumentů (Acronym: ISSHD)
Investor: Technology Agency of the Czech Republic