D 2018

Sentence and Word Embedding Employed in Open Question-Answering

MEDVEĎ, Marek and Aleš HORÁK

Basic information

Original name

Sentence and Word Embedding Employed in Open Question-Answering

Authors

MEDVEĎ, Marek (703 Slovakia, guarantor, belonging to the institution) and Aleš HORÁK (203 Czech Republic, belonging to the institution)

Edition

Setúbal, Portugal, Proceedings of the 10th International Conference on Agents and Artificial Intelligence (ICAART 2018), p. 486-492, 7 pp. 2018

Publisher

SCITEPRESS - Science and Technology Publications

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

60200 6.2 Languages and Literature

Country of publisher

Portugal

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

RIV identification code

RIV/00216224:14330/18:00100739

Organization unit

Faculty of Informatics

ISBN

978-989-758-275-2

Keywords in English

question answering; word embedding; word2vec; AQA; Simple Question Answering Database; SQAD

Tags

Tags

International impact, Reviewed
Změněno: 30/4/2019 06:08, RNDr. Pavel Šmerk, Ph.D.

Abstract

V originále

The Automatic Question Answering, or AQA, system is a representative of open domain QA systems, where the answer selection process leans on syntactic and semantic similarities between the question and the answering text snippets. Such approach is specifically oriented to languages with fine grained syntactic and morphologic features that help to guide the correct QA match. In this paper, we present the latest results of the AQA system with new word embedding criteria implementation. All AQA processing steps (question processing, answer selection and answer extraction) are syntax-based with advanced scoring obtained by a combination of several similarity criteria (TF-IDF, tree distance, ...). Adding the word embedding parameters helped to resolve the QA match in cases, where the answer is expressed by semantically near equivalents. We describe the design and implementation of the whole QA process and provide a new evaluation of the AQA system with the word embedding criteria measured with an expanded version of Simple Question-Answering Database, or SQAD, with more than 3000 question-answer pairs extracted from the Czech Wikipedia.

Links

GA15-13277S, research and development project
Name: Hyperintensionální logika pro analýzu přirozeného jazyka
Investor: Czech Science Foundation
MUNI/A/0854/2017, interní kód MU
Name: Rozsáhlé výpočetní systémy: modely, aplikace a verifikace VII.
Investor: Masaryk University, Category A