Detailed Information on Publication Record
2022
Distinguishing the Types of Coordinated Verbs with a Shared Argument by means of New ZeugBERT Language Model and ZeugmaDataset
MEDKOVÁ, Helena and Aleš HORÁKBasic information
Original name
Distinguishing the Types of Coordinated Verbs with a Shared Argument by means of New ZeugBERT Language Model and ZeugmaDataset
Authors
MEDKOVÁ, Helena (203 Czech Republic, guarantor, belonging to the institution) and Aleš HORÁK (203 Czech Republic, belonging to the institution)
Edition
Amsterdam, Towards a Knowledge-Aware AI : SEMANTiCS 2022 — Proceedings of the 18th International Conference on Semantic Systems, 13-15 September 2022, Vienna, Austria, p. 206-218, 13 pp. 2022
Publisher
IOS Press
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
60203 Linguistics
Country of publisher
Netherlands
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
printed version "print"
References:
RIV identification code
RIV/00216224:14210/22:00126225
Organization unit
Faculty of Arts
ISBN
978-1-64368-320-1
UT WoS
001176503400015
Keywords in English
natural language understanding; coordinated verbs with shared argument; zeugma; BERT language model; dataset
Tags
International impact, Reviewed
Změněno: 14/5/2024 10:18, RNDr. Pavel Šmerk, Ph.D.
Abstract
V originále
Sentences where two verbs share a single argument represent a complex and highly ambiguous syntactic phenomenon. The argument sharing relations must be considered during the detection process from both a syntactic and semantic perspective. Such expressions can represent ungrammatical constructions, denoted as zeugma, or idiomatic elliptical phrase combinations. Rule-based classification methods prove ineffective because of the necessity to reflect meaning relations of the analyzed sentence constituents. This paper presents the development and evaluation of ZeugBERT, a language model tuned for the sentence classification task using a pre-trained Czech transformer model for language representation. The model was trained with a newly prepared dataset, which is also published with this paper, of 7,849 Czech sentences to classify Czech syntactic structures containing coordinated verbs that share a valency argument (or an optional adjunct) in the context of coordination. ZeugBERT here reaches $88\,\%$ of test set accuracy. The text describes the process of the new dataset creation and annotation, and it offers a detailed error analysis of the developed classification model.
Links
MUNI/A/1184/2020, interní kód MU |
|