Detailed Information on Publication Record
2016
Czech Grammar Agreement Dataset for Evaluation of Language Models
BAISA, VítBasic information
Original name
Czech Grammar Agreement Dataset for Evaluation of Language Models
Authors
BAISA, Vít (203 Czech Republic, guarantor, belonging to the institution)
Edition
Brno, RASLAN 2016 Recent Advances in Slavonic Natural Language Processing, p. 63-67, 5 pp. 2016
Publisher
Tribun EU
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
10201 Computer sciences, information science, bioinformatics
Country of publisher
Czech Republic
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
printed version "print"
References:
RIV identification code
RIV/00216224:14330/16:00091975
Organization unit
Faculty of Informatics
ISBN
978-80-263-1095-2
ISSN
UT WoS
000466886400007
Keywords (in Czech)
jazykový model; gramatická shoda; slovesná přípona; čeština; podmět; přísudek; vyhodnocení; perplexita
Keywords in English
language model; grammar agreement; verb suffix; Czech language; subject; predicate; dataset; evaluation; perplexity
Tags
International impact, Reviewed
Změněno: 27/5/2021 09:10, Mgr. et Mgr. Vít Baisa, Ph.D.
Abstract
V originále
AGREE is a dataset and task for evaluation of language models based on grammar agreement in Czech. The dataset consists of sentences with marked suffixes of past tense verbs. The task is to choose the right verb suffix which depends on gender, number and animacy of subject. It is challenging for language models because 1) Czech is morphologically rich, 2) it has relatively free word order, 3) high out-of-vocabulary (OOV) ratio, 4) predicate and subject can be far from each other, 5) subjects can be unexpressed and 6) various semantic rules may apply. The task provides a straightforward and easily reproducible way of evaluating language models on a morphologically rich language.
Links
MUNI/A/0863/2015, interní kód MU |
| ||
7F14047, research and development project |
|