MEDKOVÁ, Helena and Aleš HORÁK. Distinguishing the Types of Coordinated Verbs with a Shared Argument by means of New ZeugBERT Language Model and ZeugmaDataset. In Dimou, Anastasia; Neumaier, Sebastian; Pellegrini, Tassilo; Vahdati, Sahar. Towards a Knowledge-Aware AI : SEMANTiCS 2022 — Proceedings of the 18th International Conference on Semantic Systems, 13-15 September 2022, Vienna, Austria. Amsterdam: IOS Press, 2022, p. 206-218. ISBN 978-1-64368-320-1. Available from: https://dx.doi.org/10.3233/SSW220022.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Distinguishing the Types of Coordinated Verbs with a Shared Argument by means of New ZeugBERT Language Model and ZeugmaDataset
Authors MEDKOVÁ, Helena (203 Czech Republic, guarantor, belonging to the institution) and Aleš HORÁK (203 Czech Republic, belonging to the institution).
Edition Amsterdam, Towards a Knowledge-Aware AI : SEMANTiCS 2022 — Proceedings of the 18th International Conference on Semantic Systems, 13-15 September 2022, Vienna, Austria, p. 206-218, 13 pp. 2022.
Publisher IOS Press
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 60203 Linguistics
Country of publisher Netherlands
Confidentiality degree is not subject to a state or trade secret
Publication form printed version "print"
WWW URL
RIV identification code RIV/00216224:14210/22:00126225
Organization unit Faculty of Arts
ISBN 978-1-64368-320-1
Doi http://dx.doi.org/10.3233/SSW220022
UT WoS 001176503400015
Keywords in English natural language understanding; coordinated verbs with shared argument; zeugma; BERT language model; dataset
Tags firank_B, rivok
Tags International impact, Reviewed
Changed by Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 14/5/2024 10:18.
Abstract
Sentences where two verbs share a single argument represent a complex and highly ambiguous syntactic phenomenon. The argument sharing relations must be considered during the detection process from both a syntactic and semantic perspective. Such expressions can represent ungrammatical constructions, denoted as zeugma, or idiomatic elliptical phrase combinations. Rule-based classification methods prove ineffective because of the necessity to reflect meaning relations of the analyzed sentence constituents. This paper presents the development and evaluation of ZeugBERT, a language model tuned for the sentence classification task using a pre-trained Czech transformer model for language representation. The model was trained with a newly prepared dataset, which is also published with this paper, of 7,849 Czech sentences to classify Czech syntactic structures containing coordinated verbs that share a valency argument (or an optional adjunct) in the context of coordination. ZeugBERT here reaches $88\,\%$ of test set accuracy. The text describes the process of the new dataset creation and annotation, and it offers a detailed error analysis of the developed classification model.
Links
MUNI/A/1184/2020, interní kód MUName: Využití strojového učení při detekci společného argumentu v koordinovaných strukturách
Investor: Masaryk University
PrintDisplayed: 16/8/2024 00:23