2022
Distinguishing the Types of Coordinated Verbs with a Shared Argument by means of New ZeugBERT Language Model and ZeugmaDataset
MEDKOVÁ, Helena a Aleš HORÁKZákladní údaje
Originální název
Distinguishing the Types of Coordinated Verbs with a Shared Argument by means of New ZeugBERT Language Model and ZeugmaDataset
Autoři
MEDKOVÁ, Helena (203 Česká republika, garant, domácí) a Aleš HORÁK (203 Česká republika, domácí)
Vydání
Amsterdam, Towards a Knowledge-Aware AI : SEMANTiCS 2022 — Proceedings of the 18th International Conference on Semantic Systems, 13-15 September 2022, Vienna, Austria, od s. 206-218, 13 s. 2022
Nakladatel
IOS Press
Další údaje
Jazyk
angličtina
Typ výsledku
Stať ve sborníku
Obor
60203 Linguistics
Stát vydavatele
Nizozemské království
Utajení
není předmětem státního či obchodního tajemství
Forma vydání
tištěná verze "print"
Odkazy
Kód RIV
RIV/00216224:14210/22:00126225
Organizační jednotka
Filozofická fakulta
ISBN
978-1-64368-320-1
UT WoS
001176503400015
Klíčová slova anglicky
natural language understanding; coordinated verbs with shared argument; zeugma; BERT language model; dataset
Příznaky
Mezinárodní význam, Recenzováno
Změněno: 14. 5. 2024 10:18, RNDr. Pavel Šmerk, Ph.D.
Anotace
V originále
Sentences where two verbs share a single argument represent a complex and highly ambiguous syntactic phenomenon. The argument sharing relations must be considered during the detection process from both a syntactic and semantic perspective. Such expressions can represent ungrammatical constructions, denoted as zeugma, or idiomatic elliptical phrase combinations. Rule-based classification methods prove ineffective because of the necessity to reflect meaning relations of the analyzed sentence constituents. This paper presents the development and evaluation of ZeugBERT, a language model tuned for the sentence classification task using a pre-trained Czech transformer model for language representation. The model was trained with a newly prepared dataset, which is also published with this paper, of 7,849 Czech sentences to classify Czech syntactic structures containing coordinated verbs that share a valency argument (or an optional adjunct) in the context of coordination. ZeugBERT here reaches $88\,\%$ of test set accuracy. The text describes the process of the new dataset creation and annotation, and it offers a detailed error analysis of the developed classification model.
Návaznosti
MUNI/A/1184/2020, interní kód MU |
|