D 2014

SQAD: Simple Question Answering Database

HORÁK, Aleš and Marek MEDVEĎ

Basic information

Original name

SQAD: Simple Question Answering Database

Authors

HORÁK, Aleš (203 Czech Republic, guarantor, belonging to the institution) and Marek MEDVEĎ (703 Slovakia, belonging to the institution)

Edition

Brno, Eighth Workshop on Recent Advances in Slavonic Natural Language Processing, p. 121-128, 8 pp. 2014

Publisher

Tribun EU

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

RIV identification code

RIV/00216224:14330/14:00077519

Organization unit

Faculty of Informatics

ISSN

Keywords in English

question answering; Simple Question Answering Database; SQAD; syntax-based question answering; SBQA

Tags

International impact, Reviewed
Změněno: 27/4/2015 11:53, RNDr. Pavel Šmerk, Ph.D.

Abstract

V originále

In this paper, we present a new free resource for comparable Czech question answering evaluation. The Simple Question Answering Database, SQAD, contains 3301 questions and answers extracted and processed from the Czech Wikipedia. The SQAD database was prepared with the aim of a precision evaluation of automatic question answering systems. Such resource was currently not available for the Czech language. We describe the process of SQAD creation, processing of the texts by automatic tokenization (Unitok) and morphological disambiguation (Desamb) and successive semi-automatic cleaning and post-processing. We also show the results of a first version of Czech question answering system named SBQA (syntax-based question answering).

Links

LM2010013, research and development project
Name: LINDAT-CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat (Acronym: LINDAT-Clarin)
Investor: Ministry of Education, Youth and Sports of the CR
7F14047, research and development project
Name: Harvesting big text data for under-resourced languages (Acronym: HaBiT)
Investor: Ministry of Education, Youth and Sports of the CR